padpadpadpad / nls.multstart

Non-linear regression in R allowing multiple start parameters
18 stars 4 forks source link

minpack.lm and weights #9

Closed mathesong closed 6 years ago

mathesong commented 6 years ago

Hey Dan,

I tried out the package on my specific problem, and it seems to work really well! However, passing arguments through to minpack.lm turned out to be a little bit trickier than expected. Upper and lower worked a charm, but weights gave me errors each time. As far as I can tell, this is because minpack.lm underneath is doing some kind of non-standard evaluation on weights, and it doesn't work when it's obscured a level by nls.multstart.

I just tried out a very simple solution: adding weights as an optional argument to nls.multstart fixes it completely. If not specified, it works as usual. If specified, then it allows one to do weighted nls.

So I'll quickly submit a PR with that.

padpadpadpad commented 6 years ago

Hi Granville

I cannot get the README.Rmd to work on the latest PR when we have incorporated weights. The fits just return NULL. Fixed this by adding an if() else() statements if weights was missing or not.

fit_aic <- function(startpars) {
        start.vals <- as.list(startpars[[1]])

        if(missing(weights)){
          try(
            fit <- minpack.lm::nlsLM(
              formula,
              start = start.vals,
              control = control,
              data = data, ...
            ),
            silent = silent
          )
        }
        else{
          try(
            fit <- minpack.lm::nlsLM(
              formula,
              start = start.vals,
              control = control,
              data = data,
              weights = weights, ...
            ),
            silent = silent
          )
        }

        AICval <- ifelse(!is.null(fit), AIC(fit), Inf)

        return(AICval)
      }
padpadpadpad commented 6 years ago

Please let me know that weights still work in your instance. The code is back working when weights are not present.

mathesong commented 6 years ago

That's super weird. I guess I was getting something wrong when checking. I'll take a look this evening.

On 14 Feb 2018 16:34, "Daniel Padfield" notifications@github.com wrote:

Please let me know that weights still work in your instance. The code is back working when weights are not present.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/padpadpadpad/nls.multstart/issues/9#issuecomment-365644120, or mute the thread https://github.com/notifications/unsubscribe-auth/AG25ZCbPr6YZdtbcOZUyCQXAqwx6xxChks5tUvzqgaJpZM4SFT7n .

mathesong commented 6 years ago

So, I'm generally very confused all over here. While working on it, I was checking whether it worked using an example in my code (I'm not allowed to share the data, but I can show you the results):

Unweighted

 output <- nls.multstart::nls_multstart(roitac ~ srtm_v_model(t_tac, reftac, bloodtac, R1, k2, bp, vBr, vBt),
                               supp_errors = 'Y',
                               start_lower = multstart_lower,
                               start_upper = multstart_upper,
                               iter = multstart_iter, convergence_count = FALSE,
                               lower = lower, upper=upper)
 output

yields

Nonlinear regression model
  model: roitac ~ srtm_v_model(t_tac, reftac, bloodtac, R1, k2, bp, vBr,     vBt)
   data: data
     R1      k2      bp     vBr     vBt 
1.55554 1.00000 0.25402 0.01664 0.00000 
 residual sum-of-squares: 5456

Weighted

output <- nls.multstart::nls_multstart(roitac ~ srtm_v_model(t_tac, reftac, bloodtac, R1, k2, bp, vBr, vBt),
                               supp_errors = 'Y',
                               start_lower = multstart_lower,
                               start_upper = multstart_upper,
                               iter = multstart_iter, convergence_count = FALSE,
                               lower = lower, upper=upper, weights=weights)
> output

yields

Nonlinear regression model
  model: roitac ~ srtm_v_model(t_tac, reftac, bloodtac, R1, k2, bp, vBr,     vBt)
   data: data
     R1      k2      bp     vBr     vBt 
1.56933 1.00000 0.25176 0.01799 0.00000 
 weighted residual sum-of-squares: 3561

You can see that in the examples above, it actually shows RSS and weighted RSS respectively, so it was seemingly doing it correctly with my functions. Why it should fail when it's running your function is very mysterious indeed.

I tried creating a weight vector of repeating 1's (which minpack.lm would do anyhow), and it still seems to have errors. So this will need a little bit more digging to figure out what's going on here.

I'll give this more of a look tomorrow. Sorry for dropping the ball on this one and thinking it was all dandy. I'll see if I can write some tests sometime soon so that this can be avoided in future.

mathesong commented 6 years ago

I promised myself I'd go to bed, but I just figured it out. This is a note to myself for tomorrow.

The issue is that my weights and my other data are all located in the parent.frame environment, whereas all of your information is in the data object. In your function call, minpack.lm doesn't have access to the information outside of the data object, whereas in my parent.frame version, it has everything in my environment.

Thus, by creating a weights vector of 1's, it didn't work. But, by explicitly adding it to the data object, everything works a charm. And by adding it nothing is changed, since it's already there for my function's situation, and it's only added within the function environment for your function's situation.

Hm. it would only require that people don't use a variable name the same as what I call weights. I suppose weights is a potentially common variable name. That's something to think about... Actually no: I'll just check for it and give an error message.

Will sleep easy after all :)

padpadpadpad commented 6 years ago

Hi Granville

Once the code brain gets into gear it is very hard to let it sleep!

I have never used parent.frame() and am not overly familiar in how it works. So all the vectors of the data are present in the environment as variables here?

Although I fixed the issue for shotgun == TRUE and convergence_count == TRUE, there are still instances where the code might break. Is it worth just adding simple if() else() statements throughout or are you considering a more elegant fix?

mathesong commented 6 years ago

Hi Daniel,

It's my first time using parent.frame too - I just saw it in minpack.lm and nls2. But in playing with it, it seems to just capture everything in the current environment and encapsulate it into an object. It's pretty cool.

I saw your solution, and it would work to include more if-else statements, but if we would ever want to add to the code, it would get quite error-prone to do so, as we'd have to make changes to every call. I'd rather make a weights vector in the code if it's missing and hand it off to data. I'll submit a PR soon.

padpadpadpad commented 6 years ago

Hi Granville

Agree that it would get quite clunky.

Have added something at the beginning of the script, adding a column data$weights <- 1 if weights is missing. Think this solves the issue as from my understanding of parent.frame() it is not affected.

Cheers Dan

PS. Slow morning. Have realised that weights is likely to be a relatively common column name so need to think how we could get around this. Would be a bit silly not to allow people to have a column name called weights.