Open diazrenata opened 5 years ago
🙃 when scaling up to ~50 seeds, ~10 options for n topics, and 1000 iterations (which is not that many) this starts to break down _results
objects and results in a cache that is slower to copy than I'd like. I've changed to (f302d1d) -
nit
theta matricesmodel_info
data frame with names and indices of associated objects instead of the actual objectsPrediction plots are now limited to the 25 most abundant species if the number of species exceeds 25. This is pretty flexible.
In order to compute the full likelihood, we need a TS model plus the data and LDA that went into it. The way drake is set up right now, I've been reconstructing these relationships by parsing the names of individual TS models. This gets trickier if we want to be able to add 1) calculating the full likelihood and 2) generating data-prediction comparisons for document-term-abundances to the pipeline (which I think we do, because it's a lot of heavy lifting for a .Rmd).
The most straightforward way I see to do this is wrapping each object in a list of two elements, the object itself and a list of the objects upstream of it. So the output of
run_LDA
, for example would belist(lda = [LDA model set], upstream = list(data = [data]))
, and the output ofrun_TS
would belist(ts = [TS model set], upstream = list(data = [data], lda = [LDA model set[))
.I'll try this out in a branch...
(If size became an issue the second list could be a list of names, and then we'd do some rlang stuff on it, but I think sticking an LDA_set + an empirical dataset to a TS model set won't be a big deal).