padpadpadpad / nls.multstart

Non-linear regression in R allowing multiple start parameters
18 stars 4 forks source link

Global fit with nls.multstart #21

Open fathi0amir opened 1 year ago

fathi0amir commented 1 year ago

I have been wandering around to find a good way for global fitting in R. I used nls.multstart and it works predictably (awesome package) I want to use this package like I can use with nls in base R.

#Generating sample data
d <- transform(
    data.frame(
        x = seq(0, 1, len = 17),
        group = rep(c("A", "B", "B", "C"), len = 17)
    ),
    y = round(1 / (1.4 + x^ifelse(group == "A", 2.3, ifelse(group == "B", 3.1, 3.5))), 2),
    group = as.factor(group)
)
#Fitting data 
fit <- nls(y ~ 1 / (b + x^p[group]),
    data = d,
    start = list(b = 1, p = rep(3, length(levels(d$group))))
)

"The code is from here" In the sample code above I can use "[]" to asign a parameter to each group while keeping "b" as shared parameter.

I tried to follow the same pattern in nls.multstart but I get the following error

fitMulti <- nls.multstart::nls_multstart(
    data = d,
    y ~ 1 / (b + x^p[group]),
    iter = 200,
    start_lower = list(b = 0, p = rep(0, length(levels(d$group)))),
    start_upper = list(b = 5, p = rep(5, length(levels(d$group)))),
    supp_errors = "Y",
    convergence_count = 100,
    na.action = na.omit
)

#ouput
 Error in `as_tibble()`:
! Column names `b` and `p` must not be duplicated.
Use `.name_repair` to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.
✖ These names are duplicated:
  * "b" at locations 1 and 3.
  * "p" at locations 2 and 4.
Run `rlang::last_trace()` to see where the error occurred.

Is there a way to use nls.multstart to do global fitting in this context?

padpadpadpad commented 1 year ago

Hi Fathi,

I have done (and seen) a fair bit of nls() in my time and have never seen anyone fitting models like this using nls()!

I checked whether this group sharing of parameters works using minpack.lm::nlsLM() or nls2::nls2() which uses brute force and both of those errored too.

# try minpack.lm
fit <- minpack.lm::nlsLM(y ~ 1 / (b + x^p[group]),
           data = d,
           start = list(b = 1, p = rep(3, length(levels(d$group))))
)

# try brute force
fit <- nls2::nls2(y ~ 1 / (b + x^p[group]),
                         data = d,
                         start = list(b = 1, p = rep(3, length(levels(d$group))))
)

Why are you trying to estimate a global estimate of one parameter and an individual estimate for others? What are you interested in doing downstream of this (if anything)?

This sounds like something you could test with nlme::gnls() although I have not used it for a long time.

# try nlme::gnls
fitmulti <- nlme::gnls(model = y ~ 1 /(b + x^p),
                  params = list(b~1, p ~ group),
                  data = d,
                  start = list(b = 1, p = rep(3, length(levels(d$group)))))

This works! Or possibly a random effects model with a random effect of group for p but not for b.

fathi0amir commented 1 year ago

Thanks for getting back to me so fast.

What I want to do downstream might be a bit too much and straight dumb. I will try to explain blow in the best way I can. I did try all the above packages. "nlme::gnls" does work predictively but cannot get "brute-force" option or like your package generate random starting points.

Now, let me try to explain my goal. I have a spectrum and it evolves over time. For each point in the spectra (each wavelength in my case) the data can be typically fitted with one or two exponentials. The exponent constant is shared between all the spectra, but the amplitude of the exponential is not. So, the best way to fit this data set is to fit with exponent constant (time constant in my case) globally and fit the amplitudes independently. There are standard ways to deal with this, but it gets trial and error in some cases when the cost function stocks at some local minimum. The most typical way in this case (transient absorption spectra) is to do SVD and only fit the eigenvectors independently. The only physically meaningful data would be the time constant. My current direction is to use SVD approach and find the time constants then use grouped parameter option in base R nls. Sorry for the lengthy explanation.

I will appreciate if you have could give me a better direction.

Before I did use multistart in MATLAB (without shared parameters) now I am bringing everything to R. Using a multiple starting point is a great option, although it has some overhead and made my analysis longer, but it saved me more time as I didn't have to go through fitted parameters

mathesong commented 1 year ago

I have also never seen this kind of use of the nls function, and I don't think it could be easily incorporated into this package. However one way that you might consider applying such a model is to do it as a nested fitting procedure. We do this for some applications in my field (see this paper for example: https://pubmed.ncbi.nlm.nih.gov/25542534/).

We typically approach this problem by setting up a grid search across potential values of the shared parameter, and then for each value of the shared parameter, we fit all the other parameters with the shared parameter set to the grid value and calculate the weighted residual sum of squares. Then we add these up for each value along the grid of the shared parameter, and select the value which minimises the WRSS across all the curves. Then we can fit the model with the shared parameter set to the selected value, and fitting the rest of the parameters.

Unfortunately it's a bit slow (especially if you're using lots of starting points for each fit), and it requires some setup, but it should get the job done, and allow you to use the multiple starting points approach.

On the other hand, this feels like it might be a good application of a mixed effects model. If you anticipate that the other parameters should follow a normal distribution, you could parameterise a model using nlme with fixed effects for all your parameters, and random effects for all but your shared parameter.

fathi0amir commented 1 year ago

Thank you both for your time and energy to have this topic discussed with me.

I think what I am trying to achieve has little significance, but I just have a strong feeling based on my past experience. I did a lot of time-resolved emission spectra, transient absorption spectra and spectral transient absorption imaging. I might be able to model the excited/ground state density with this kind of measurement. Sorry for all these field specific terminologies.

Thank you @mathesong for sending me to your paper. I will have it checked and get some insight for my objective. Plus, thanks for the detailed direction in using nested fitting procedure. I would rather spend some overhead time and get a processing pipeline perform predicable and reliable cause it will save me more time down the road when I can use it as a tool. I will check the nlme documentation as well and see which solution will be within my abilities and time. Cheers!