nicholasjclark / mvgam

{mvgam} R 📦 to fit Dynamic Bayesian Generalized Additive Models for time series analysis and forecasting
https://nicholasjclark.github.io/mvgam/
Other
101 stars 12 forks source link

Future enhancement wishlist #38

Open nicholasjclark opened 9 months ago

nicholasjclark commented 9 months ago
jonathonmellor commented 6 months ago

Apologies if this is the wrong place (happy to move) to ask this: would it be possible to add the binomial family into the supported families?

Example use case:

Given binary data [0, 1] of absence/presence of a disease, we expect the positivity to change in a sampled population over time, and would like to extract this from the logistic regression while taking the temporal trend into account (currently just doing an mgcv spline, but would prefer a GP, AR, RW).

Love the package, planning to dig more into the forecasting aspect now winter has ended in our northern hemisphere and considering whether we can nowcast as well.

nicholasjclark commented 6 months ago

Thanks @jonathonmellor, yes absolutely. I'm planning to include Binomial and Beta-Binomial. Happy to make that a first priority if it can be useful to you

jonathonmellor commented 6 months ago

That would be amazing thank you @nicholasjclark ! For awareness I am also exploring upgrading our RSV age-region stratified forecasting model to mvgam. Great to see how much progress has been made with the package, well done!

nicholasjclark commented 6 months ago

Hi again @jonathonmellor, I've pushed a new release that brings support for Binomial, Bernoulli and Beta-Binomial. Still working on documentation to show how these work but you can try them out now if you like. The examples I use in the test script give an idea of how they should function: https://github.com/nicholasjclark/mvgam/blob/master/tests/testthat/test-binomial.R. Thanks again for the suggestion, and do let me know if I can be of help to upgrade your models

jonathonmellor commented 6 months ago

Amazing, thanks @nicholasjclark! I've had a look for my use case and got the modelling working on a subset of the data. I think I have an additional challenge as this is a survey cohort with irregular temporal sampling - as a result I needed to do 1 series per participant, which makes the majority of the data empty because of the time x series requirement. The end product would be an MRP, with a postratification to the whole population.

Model formula roughly:

is_positive ~ s(time) + s(time, by=age_group) + s(time, by=region), bernoulli family. Was hoping to do a GP instead of a cubic regression with mvgam. Each participant gives a survey response every month, with testing windows opening and closing. So multiple data points (survey responses) per time, the question of "what is a series in this case" became difficult due to the pooling used. I think due to the size of the data (400,000 records) this might not be the best use case, but I am still planning on upgrading our gam forecasting approaches away from smoother extrapolation to the more principled approaches in this package.

nicholasjclark commented 6 months ago

Thanks @jonathonmellor for the explanation. That does sound interesting and challenging. Would you mind sharing an example of what the data might look like so I can think about whether anything else could be done?