urol-e5 / timeseries

Data generated from e5 time series sampling in Moorea
2 stars 0 forks source link

Write scripts for nls quadratic fit for PI curves #31

Closed AHuffmyer closed 3 years ago

AHuffmyer commented 3 years ago

I was able to write a script to run an nls fit for each colony (thanks for your help Ferdi!). However there are a couple outstanding questions: (1) - our start values will change the parameters of Am, AQY, Rd. Right now, I have start values that are in the range expected (based on running an nls for a single colony). How do we decide what our final start values should be? (2) - I am not currently able to save individual plots since this is running in a different format than the previous script. I'll work on this next. These plots also need to have lower and upper bounds, right now I only have the fitted line.

You can see the updates in the pi curves nls script in timepoint one or in the attached knitted PDF pi_curves_nls.pdf !

AHuffmyer commented 3 years ago

Script can be found here: https://github.com/urol-e5/timeseries/blob/master/timepoint_1/scripts/pi_curves_nls.Rmd

ferdi-p commented 3 years ago

Awesome!

Concerning point (1): I was naively hoping the initial values wouldn't matter... You found cases where they did change the outcome? Do you think there are options for nls to make it run for more iterations or something similar?

AHuffmyer commented 3 years ago

I did find cases where changing the starting value changed the outcome, in the range of (0.001-0.01) change. There are also some starting values that do not allow the nls to run (infinite values error). In some code online, I see that some people do not include starting values, but the nls does not run in our case without starting values. I am not sure about iterations or bootstrapping, I can look into that.

AHuffmyer commented 3 years ago

I also tried to set the starting values according to the data itself (as in a script from N Silbiger) using this code: start=list(Am=(max(Pc)-min(Pc)),AQY=0.05,Rd=-min(Pc)) - and it still wont run.

ferdi-p commented 3 years ago

I can check in a bit.

I think the idea of using those initial values that depend on the data is good. We could even think about using for AQY something like the slope during the first say 5 values, maybe that's complicated though... Maybe even with increased tolerance or iterations any reasonable (positive) guess could do.

For that, I just looked at the help file for nls

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/nls.control

It says there is an option tol - maybe using a small value for that could help. I don't know what's small at the moment, so we'd need to try that I guess...

AHuffmyer commented 3 years ago

Ok I gave that a shot and I am able to code for the starting values to come from the data - but I am now getting a "singular gradient matrix" error. The code has been pushed so you are welcome to take a look!

ferdi-p commented 3 years ago

Ok, I'm trying to wrap my head around it. Already learned how to pull the git xD Now I'm trying your code but I'm getting different errors. Can you again upload the version where everything worked with fixed ini values? (Or can I already find this somewhere in the history?)

AHuffmyer commented 3 years ago

Yes! I just pushed a revised version of the script. You will find a code chunk ("this one works") with fixed initial values and one that doesn't with alternative initial values based on the data itself. :)

ferdi-p commented 3 years ago

Ok, I think I fixed it. To not mess with your files, I added a new script with the changes (pi_curves_nls_f_2.Rmd). I didn't manage to set the right directory for input and output, so I changed that to my local filesystem. You'd need to change that back to the original directories... Also I controlled, the output with the fixed inis and the inis from the data looked the same to me? (I didn't check all so I could be missing something)

AHuffmyer commented 3 years ago

This works!! I'll keep both the fixed initials and flexible initial scripts in the .rmd so we can reference them. THANK YOU! I'll keep this issue open until we reach a final version we are happy with. Next I'll try to run this script for each additional timepoint.

AHuffmyer commented 3 years ago

Updated PDF of knitted document here: pi_curves_nls.pdf

ferdi-p commented 3 years ago

Perfect! Let me know when there's something else to do...

AHuffmyer commented 3 years ago

PI curves with nls are running successfully for TP1 and 2! PDF outputs showing curves for each species and parameter values are available in the timepoint output folders. However, I am now stuck again at TP3 in the pi_curve_rates.rmd (which calculates the values we need to put into the nls). See previous issue and problem on line #66 (mutate interval error) in the script here: https://github.com/urol-e5/timeseries/blob/master/timepoint_3/scripts/pi_curves_rates.Rmd. The script is the exact same as previous timepoints that work. Does anyone have any insights on this issue?

ferdi-p commented 3 years ago

Just looking at it. Line 66 seems to run fine here. You already fixed it - right?

AHuffmyer commented 3 years ago

Interesting! We are getting different results depending on R version - we have found some typos in the file names so I am working on fixing those errors today and will close the issue if it works!