lindeloev / mcp

Regression with Multiple Change Points
http://lindeloev.github.io/mcp
105 stars 19 forks source link

Multiple predictors #90

Open lindeloev opened 4 years ago

lindeloev commented 4 years ago

Each segment should take an arbitrary number of linear predictors. As with the segmented package, the only requirement is that one continuous predictor (say, x) is the dimension of the change point. The change point is simply the value on x where the predictions of y changes to a different regression model (parameter structure and/or values).

So this API should work. It has the following features:

model = list(
    y ~ 1 + x*group + z + sigma(1 + group),  # interactions and main effects and a covariate.
    ~ 0 + x + ar(2, z),  # only one slope
    ~ 1 + group  # a range of x where group is the only predictor
)

JAGS-wise the indicator functions would be the same but now we additionally pass design matrices (X1_, X2_, etc.) and use inprod() per segment. The model above would be something like:

    # Priors for individual parameters
    cp_1 ~ ... T(MINX, MAXX)
    cp_2 ~ ... T(cp_1, MAXX)
    int_1 ~ dnorm(0, 1^-2)
    int_3 ~ ...
    xGroupFemale_1 ~ dnorm(0, 1)  # check R naming convention
    z_1 ~ dunif(0, 100)
    x_2 ~ dnorm(4, 3^-2)
    xGroupFemale_3 ~ ...

    # Model and likelihood
    for(i_ in 1:length(x1_)) {
        y_[i_] = (x_[i_] > cp_0) * (int_1 + inprod(c(xGroupFemale_1, z_1), x1_[i_, ])) + 
                (x_[i_] > cp_1) * (0 + inprod(c(x_2), x2_[i_, ])) +
                (x_[i_] > cp_2) * (int_3 + inprod(c(xGroupFemale_3), x3_[i_, ]))

        response[i_] ~ dnorm(y_[i_], sigma_[i_])
    }

where xi_ is a model matrix that is built R-side and x_ is par_x along which change points are defined. Implementing this adds the following work points:

Data structure

Modeling, sampling, and summaries

Simulated, fitted, and predicted values

Plot

Tests

sjmgarnier commented 4 years ago

@lindeloev Out of curiosity, do you have a timeline on when this will be implemented?

lindeloev commented 4 years ago

@sjmgarnier I got this to work locally but while it provides much more modeling options (and is easier to maintain/extend), sampling of currently supported models take double the time. mcp is already like 100x slower than other packages. Tries to be useful for modeling options rather than speed. So I'm a bit in two minds whether to continue down this path. What do you think?

sjmgarnier commented 4 years ago

@lindeloev I'll answer very selfishly by saying that I need it for a project that I'm working on :-) I could not find a satisfying alternative to perform weighted segmented regressions with random effects. And I'm a patient man with a powerful computer, so I can wait for a few minutes that the fit finishes. But I'm also very well aware of how time-consuming it is to develop and maintain packages, so I would completely understand if you decided to focus on other priorities.

lindeloev commented 4 years ago

Cool, mcp is henceforth aimed at patient users with powerful computers :-) Yes, for data with < 200 data points, we're talking a slowdown from ~10 secs to ~20 secs so it's not the end of the world. I think this would be awesome so it's my top priority for the next release (mcp 0.4). Though I think that it will likely be at least a few months before it is out.

The first version with this will likely not have random effects on the RHS. But using a categorical intercept (like group in the OP) should be reasonably close in many scenarios since shrinkage is often minor.

lindeloev commented 3 years ago

I'm working on this now and have almost finished making all design decisions. Luckily, I found an implementation that won't negatively affect performance. Expect release in a few months, depending on how hard it is to make sensible plot() etc.

mattmoo commented 3 years ago

Is this implemented in a development version? A reviewer demands a multivariate analysis :/

lindeloev commented 3 years ago

@mattmoo I just pushed the development version to branch v0.4 here. I think you can install it using remotes::install_github("lindeloev/mcp@v0.4"). You can fit it and see parameter summaries, including plot_pars(fit). It works for e.g. y ~ x + sigma(1 + x:group) too.

But fit$simulate(), plot(), fitted()/predict(), and pp_check() won't, though I'm making good progress! So you may want to shift back and forth between versions if you need some of the old functionality for one-predictor-only models.

I've only tested it on gaussian models so far, so please tripple-check. And any feedback and ideas would be much appreciated, BTW!

lindeloev commented 3 years ago

There is good progress on this. I just pushed the latest version to branch v0.4. Install using remotes::install_github("lindeloev/mcp@v0.4"). The easiest way to see it in action is running result = mcp_example("multiple").

I'm basically just missing plot() support for varying effects and categorical predictors.

mattmoo commented 3 years ago

That's great! I'm waiting on compute resources to actually run the analysis (50,000 datapoints :/)

lindeloev commented 3 years ago

I just tested the performance locally on a Ryzen 5 3600. For 55.000 data points and a 15-parameter model with categorical predictors, I get around 1 sample per second. So if you run in parallel with default settings (3000 warmup iters + 1500 data iters), you might complete sampling in a matter of hours?

ex = mcp_example("multiple")
df_tmp = tidyr::expand_grid(rep = 1:250, ex$data) %>%  # upscale to 55000 data points
  dplyr::select(-rep)
fit = mcp(ex$model, df_tmp, par_x = "x", chains = 1, adapt = 100, iter = 100)

I just pushed a new commit to v0.4 which requires ~10% of the memory of the previous version during sampling. Maybe that helped too.

mattmoo commented 3 years ago

I'll give it a go, from my previous experience with these data (see link), the sampling does not converge so quickly (and I do not have such a nice processor!).

adrose commented 1 year ago

Hi, is this issue still up to date or is there additional information for how to incorporate additional predictors?

lindeloev commented 1 year ago

@adrose Unfortunately not. The v0.4 branch currently doesn't run out-of-the-box due to some backwards-incompatible changes in the dependency packages, that I only incorporated into the v0.3 series. I'm presently prioritizing another project higher but really looking forward to getting v0.4 out since it's awesome!

gavanmcgrath commented 1 year ago

Love the package. Looking forward to v0.4.