weecology / MATSS-LDATS

Macroecological LDA analysis of time series
MIT License
3 stars 0 forks source link

Keep corresponding data, lda, and ts models together #40

Open diazrenata opened 5 years ago

diazrenata commented 5 years ago

In order to compute the full likelihood, we need a TS model plus the data and LDA that went into it. The way drake is set up right now, I've been reconstructing these relationships by parsing the names of individual TS models. This gets trickier if we want to be able to add 1) calculating the full likelihood and 2) generating data-prediction comparisons for document-term-abundances to the pipeline (which I think we do, because it's a lot of heavy lifting for a .Rmd).

The most straightforward way I see to do this is wrapping each object in a list of two elements, the object itself and a list of the objects upstream of it. So the output of run_LDA, for example would be list(lda = [LDA model set], upstream = list(data = [data])), and the output of run_TS would be list(ts = [TS model set], upstream = list(data = [data], lda = [LDA model set[)).

I'll try this out in a branch...

(If size became an issue the second list could be a list of names, and then we'd do some rlang stuff on it, but I think sticking an LDA_set + an empirical dataset to a TS model set won't be a big deal).

diazrenata commented 5 years ago

🙃 when scaling up to ~50 seeds, ~10 options for n topics, and 1000 iterations (which is not that many) this starts to break down _results objects and results in a cache that is slower to copy than I'd like. I've changed to (f302d1d) -

Prediction plots are now limited to the 25 most abundant species if the number of species exceeds 25. This is pretty flexible.