weecology / LDATS

Latent Dirichlet Allocation coupled with Bayesian Time Series analyses
https://weecology.github.io/LDATS
Other
25 stars 5 forks source link

Include covariates in `TS_on_LDA` output? #123

Closed ha0ye closed 5 years ago

ha0ye commented 5 years ago

I'm a little unclear on what the structure is of the TS_on_LDA output (this is tangentially related to this issue).

In the paper-comparison vignette, there's a bit of post-processing to convert the output into dates using the covariate table:

ldats_ldats_cpt_dates <- as.data.frame(ldats_ldats_cpt_dates)
colnames(ldats_ldats_cpt_dates) <- 'newmoon'
ldats_ldats_cpt_dates <- dplyr::left_join(ldats_ldats_cpt_dates, rodents$document_covariate_table, by = 'newmoon')
ldats_ldats_cpt_dates <- ldats_ldats_cpt_dates$censusdate

However, this table is an input into running TS_on_LDA, so it seems like the output can either include the post-processing, or that the output object has the covariate table (and is therefore self-contained for loading and further processing).

ha0ye commented 5 years ago

Actually, maybe we need to rethink more carefully about how this works. One thing that a user might want to do is convert the resulting changepoints into dates, but this might be problematic if the data is missing data.

Probably whatever interpolation that LDATS is doing internally, should be mirrored for the output in order to e.g. translate changepoint times into dates.

juniperlsimonis commented 5 years ago

the covariates are in the output from TS_on_LDA (which is basically just a list of the TS outputs for each of the LDA models) for example, in the simple example in the documentation for LDA_TS, the resulting object mod is a list with 4 elements, the first two are the LDAs (all of them and then the selected one) and the second two are the TSs (all of them, i.e. the return from TS_on_LDA, and then the selected one). If you want to pull out what was returned from TS_on_LDA use: mod$"TS models" then grab the first model for reference mod$"TS models"[[1]] this has a whole bunch of elements, including "data", which is includes the covariates and the gammas used in fitting the model.

totally agree that a user could want that conversion

juniperlsimonis commented 5 years ago

ok, so with respect to the changepoint locations that come out of TS (which is what TS_on_LDA wraps around), the values are defined on the scale of integers that cover the range of values input by the user via timename (non-inclusive of the end points). conversion from (and back to) dates has actually never been a feature within LDATS. i think @diazrenata might have coded something to do conversion from dates to integers in MATSS though. the other thing is that technically speaking one can pass dates in via the timename'd column, but then the assumption is that day is the timestep, and the conversion is what R does internally based on ISO 8601. basically all that's done internally to create the available times for a changepoint is

times <- seq(min_time, max_time, 1)
available_times <- times[-c(1, length(times)]

all that is to say the functions in LDATS actually don't really manage the conversion into a data format for the times, so it would be challenging at this point to manage the conversion back to user times.