steve-the-bayesian / BOOM

A C++ library for Bayesian modeling, mainly through Markov chain Monte Carlo, but with a few other methods supported. BOOM = "Bayesian Object Oriented Modeling". It is also the sound your computer makes when it crashes.
GNU Lesser General Public License v2.1
35 stars 14 forks source link

Regressors specification with binomial model #49

Closed karthickgopalswamy closed 4 years ago

karthickgopalswamy commented 4 years ago

Hi Steve, Thanks for the BSTS package. We have been using bsts (state space models with regressors) with gaussian family for quite sometime. The nature of the data suggests that binomial (logit) family will be more appropriate (sales data) for the following reason.

  1. It can remove the negative predictions
  2. One can explicitly specify the max value of the predictions (number of trials).
  3. Bounded finite support of binomial is quite useful in avoiding predictions that are orders of magnitude higher than what we could expect.

I have not found the documentation to include regressors in the logit family as one would do it gaussian For eg. right now, I have bsts(y ~ x1 + x2 + x3, family = 'gaussian', data = as.data.frame(cbind(y,x1,x2,x3)))

How would I specify the logit model? since the documentation says one need to specify 2 col (no of success, no of failures) without regressors its would be bsts(cbind(success, failure), family = 'logit')

Also I do not want to use AddDynamicRegressor().

Thanks, Karthick

steve-the-bayesian commented 4 years ago

Hi Karthick, It has been a while since I looked at the logit code, but you seem to be on the right track.

bsts(cbind(s, f), family="logit") is close. I'd expect that would work with or without a formula on the RHS. If not then try cbind(s, f) ~ 1 (which is R for "fit a constant"). I think you need to specify the state.specification and niter variables as well. Note that the state specification for logit models is a bit more complicated than for gaussian models, because bsts is not quite as good at guessing defaults. You probably need to specify explicit SdPrior() priors for the standard deviations of whichever models you want to use for state.

Hopefully I have answered your question. If not, feel free to try again.

Steve

On Tue, Jun 23, 2020 at 7:17 AM Karthick Gopalswamy < notifications@github.com> wrote:

Hi Steve, Thanks for the BSTS package. We have been using bsts (state space models with regressors) with gaussian family for quite sometime. The nature of the data suggests that binomial (logit) family will be more appropriate (sales data) for the following reason.

  1. It can remove the negative predictions
  2. One can explicitly specify the max value of the predictions (number of trials).
  3. Bounded finite support of binomial is quite useful in avoiding predictions that are orders of magnitude higher than what we could expect.

I have not found the documentation to include regressors in the logit family as one would do it gaussian For eg. right now, I have bsts(y ~ x1 + x2 + x3, family = 'gaussian', data = as.data.frame(cbind(y,x1,x2,x3)))

How would I specify the logit model? since the documentation says one need to specify 2 col (no of success, no of failures) without regressors its would be bsts(cbind(success, failure), family = 'logit')

Also I do not want to use AddDynamicRegressor().

Thanks, Karthick

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/steve-the-bayesian/BOOM/issues/49, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVDVIOHXVZ4Y5DA25O42DRYC2QFANCNFSM4OFWZ3VA .

karthickgopalswamy commented 4 years ago

Thanks Steve, so cbind(s,f) ~ x1 + x2 , will work for regression formula? I am just wondering how bsts construct the explicit model. The model I want to considers is y_t ~ Binonmia(n, pt) where pt ~ exp(trend + seasonal + Xb)

steve-the-bayesian commented 4 years ago

Yes, that will work for the model you want to fit. (well, pt = exp(eta_t) / [1 + exp(eta_t)] where eta_t = trend + seasonal + Xb. I think you forgot the denominator!)

On Tue, Jun 23, 2020 at 11:47 AM Karthick Gopalswamy < notifications@github.com> wrote:

Thanks Steve, so cbind(s,f) ~ x1 + x2 , will work for regression formula? I am just wondering how bsts construct the explicit model. The model I want to considers is y_t ~ Binonmia(n, pt) where pt ~ exp(trend + seasonal + Xb)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steve-the-bayesian/BOOM/issues/49#issuecomment-648347553, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVDVJV4MZ7AA52HEZAZ7DRYD2CJANCNFSM4OFWZ3VA .

karthickgopalswamy commented 4 years ago

Thanks! Meantime I saw you are building python wrappers and posted that you would need help. I know your BOOM codebase well enough and have worked with pybind11 for wrapping. So let me know if you need help!

steve-the-bayesian commented 4 years ago

Thanks! The basic structure is all worked out. I have Gaussian models working with 3 or 4 state models. I need help

If you want to play, check out the BOOM dev branch on github, and from the top level run ./install/pyboom. Then you can do a local pip install of Interfaces/python/bsts.

Steve

On Tue, Jun 23, 2020 at 12:31 PM Karthick Gopalswamy < notifications@github.com> wrote:

Thanks! Meantime I saw you are building python wrappers and posted that you would need help. I know your BOOM codebase well enough and have worked with pybind11 for wrapping. So let me know if you need help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steve-the-bayesian/BOOM/issues/49#issuecomment-648372817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVDVO6FXQHRFYPS6OEXVDRYD7HPANCNFSM4OFWZ3VA .

karthickgopalswamy commented 4 years ago

Sure. Let me take a look!

steve-the-bayesian commented 4 years ago

Karthick, Do you happen to know how to work with pandas data frames in pybind11? https://www.reddit.com/r/Python/comments/hf425i/how_to_mix_pandas_and_pybind11/

On Tue, Jun 23, 2020 at 12:38 PM Karthick Gopalswamy < notifications@github.com> wrote:

Sure. Let me take a look!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steve-the-bayesian/BOOM/issues/49#issuecomment-648376444, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVDVMYAOAZX4IJUDQ2KLDRYEADLANCNFSM4OFWZ3VA .

karthickgopalswamy commented 4 years ago

Steve, There is support for registering structured NumPy dtypes using the macro PYBIND11_NUMPY_DTYPE.

steve-the-bayesian commented 4 years ago

Thanks.

I'm probably being dense, but I'm not sure I get the connection. I'm trying to take a pd.DataFrame, and build a BOOM::DataTable (from stats/DataTable.hpp). I'll need to copy the underlying data. Is the idea that a DataFrame is really a numpy structured array under the covers, and I should just use that directly?

On Wed, Jun 24, 2020 at 10:23 AM Karthick Gopalswamy < notifications@github.com> wrote:

Steve, There is support for registering structured NumPy dtypes using the macro PYBIND11_NUMPY_DTYPE.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steve-the-bayesian/BOOM/issues/49#issuecomment-648955689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVDVLLVO6A6KCLJMHVSYTRYIZBLANCNFSM4OFWZ3VA .

karthickgopalswamy commented 4 years ago

Yea. Thats what I would do. Either go by rows or by columns. By rows it gonna be a dictionary but its kind of an overkill if you just want the data copied. By columns you can map each columns to numpy array.