weecology / LDATS

Latent Dirichlet Allocation coupled with Bayesian Time Series analyses
https://weecology.github.io/LDATS
Other
25 stars 5 forks source link

Processing non-integer timesteps #126

Closed diazrenata closed 5 years ago

diazrenata commented 5 years ago

See https://github.com/weecology/MATSS-LDATS/issues/13.

The LDA needs the timename column to be integers. We could build in some functionality to assign integer values to non-integer timesteps and convert back?

diazrenata commented 5 years ago

@diazrenata identify the problem datasets in MATSS

diazrenata commented 5 years ago

Ok, so turns out the only non-integer timesteps in MATSS so far have years in increments of .5. So nothing too intense!

See: https://github.com/weecology/MATSS-LDATS/blob/check_time_data/analysis/reports/time_check.md

ha0ye commented 5 years ago

@diazrenata Was this the Jornada data?

diazrenata commented 5 years ago

SGS and the jornada.

I’m not at my computer right now and I don’t 100% remember how I set up the error handling, but I think as long as it doesn’t throw a complete halting error it should be easy to tweak if necessary.

I’m pretty sure MATSS-LDATS does try to coerce to integers. I can track the specifics down in a little bit!

On Fri, Jun 28, 2019 at 11:47 Hao Ye notifications@github.com wrote:

@diazrenata https://github.com/diazrenata Was this the Jornada data?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/weecology/LDATS/issues/126?email_source=notifications&email_token=AEH6DN3MXXT4C67PSIQDBJDP4YXBXA5CNFSM4HSIPIW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY2OFWY#issuecomment-506782427, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH6DNZKFRRI53RYICAE5G3P4YXBXANCNFSM4HSIPIWQ .

ha0ye commented 5 years ago

Yeah, it would be good to know how you're doing it. I think tidyr::full_seq() is likely a good option to do that, which means I can add period to the metadata for SGS and Jornada, and update the data checks and tests.

diazrenata commented 5 years ago

You could just go for it? Whatever I did was just a stopgap to keep the pipeline from failing with a footnote saying to do something permanent later :P

On Fri, Jun 28, 2019 at 11:56 Hao Ye notifications@github.com wrote:

Yeah, it would be good to know how you're doing it. I think tidyr::full_seq() is likely a good option to do that, which means I can add period to the metadata for SGS and Jornada, and update the data checks and tests.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/weecology/LDATS/issues/126?email_source=notifications&email_token=AEH6DN6DTKY5MCGYVXU5TULP4YYB3A5CNFSM4HSIPIW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY2O3YA#issuecomment-506785248, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH6DN4JQ72U5OL5WELGUV3P4YYB3ANCNFSM4HSIPIWQ .

ha0ye commented 5 years ago

Ok, I'll work on:

and then I think we can defer working on the processing code ~ weecology/MATSS#96 until after we discuss implementation details

diazrenata commented 5 years ago

lol, MATSS-LDATS checks if the timename column could be swapped for a timesteps column that is just the row numbers, and complains if it can't, but makes no attempt to actually make that change. (Because when I wrote it the only outcome that was actually happening was that you couldn't make the change).

If MATSS::check_data_format and MATSS-LDATS's data checking function (which checks that $timename exists and checks if the timename column is integers) fail, it just skips the TS model.

(See this: https://github.com/weecology/MATSS-LDATS/blob/master/R/check_ts_data.R)

ha0ye commented 5 years ago

@diazrenata Here's a proposed implementation - https://github.com/weecology/MATSS/issues/136 Would that work for you to modify the pipeline in MATSS-LDATS?

juniperlsimonis commented 5 years ago

i'm going to fold this into the other issue on data helper functions, as that's the general concept that the time issue falls under