urol-e5 / timeseries

Data generated from e5 time series sampling in Moorea
2 stars 0 forks source link

Error in pi_curve_rates.rmd script for TP4 in lines 122-127 #34

Closed AHuffmyer closed 3 years ago

AHuffmyer commented 3 years ago

We have successfully written nls PI curve scripts for TP1-4. However, in TP4, there is a new error in generating local regressions from thinned data (this problem is not present in TP1-3). The error is in pi_curve_rates.Rmd in timepoint 4 in lines 122-127.

AHuffmyer commented 3 years ago

@hputnam and @daniellembecker do you have any insights on this error? I cannot locate where the problem is..

AHuffmyer commented 3 years ago

Error: Problem with mutate() input regs. x invalid 'length.out' argument

Confirmed that all colony names in dataframes match names in metadata

AHuffmyer commented 3 years ago

Tried removing/filtering na's prior to mutate function, did not work. Tried filtering to manually add one colony at a time to see at what point it fails. When filtering to the first 5 colonies, the function runs with no mutate() error, but only 3 of the 5 colonies are in the generated data set. Script with these steps pushed to Git, having trouble figuring out the problem.

AHuffmyer commented 3 years ago

By filtering by individual colony, I found a lot of colonies that either 1) generate no data in mutate function or 2) generate the mutate error. It seems that we have some widespread problems in the data for multiple colonies. We now need to identify what these issues are.

AHuffmyer commented 3 years ago

The following are colonies that do not successfully run:

ACR-220

ACR-225

ACR-229

ACR-343

ACR-368

BK-3

BK-5

POC-239

POC-248

POC-254

POC-358

POC-359

POC-366

POC-369

POR-221

POR-242

POR-245

POR-338

POR-340

I tried filtering these out of the main LoLinR function but a mutate() error still comes up. These are examples of what the data look like for colonies that do not run (POR-340 and ACR-220) compared to those that do run (ACR-139 and POR-83). I can see that those that don't run have low sample sizes for at least one of the light values. I also checked file formats and the column headers are the same between "good" and "bad" colonies. Will attempt to adjust data thinning as the next step.

Screen Shot 2021-05-07 at 11 00 38 Screen Shot 2021-05-07 at 11 00 28 Screen Shot 2021-05-07 at 11 00 20 Screen Shot 2021-05-07 at 11 00 10
AHuffmyer commented 3 years ago

Solved this issue by reducing data thinning to retain more data points! We still need to QC these values with other timepoints since this data set gave us trouble.

AHuffmyer commented 3 years ago

Am and Rd values are definitely in line with previous time points. AQY values are higher (0.05 as compared to ~0.02).

hputnam commented 3 years ago

@daniellembecker @dconetta @AHuffmyer Why do those samples have fewer data points?

daniellembecker commented 3 years ago

There is nothing I can see in those data sheets that look different from others. I compared the time in minutes ran between individuals that ran without an error and all the individuals that had an error. Each of the specific individuals that had an error was still run at > 10 minutes per light level and had a ~ 2-minute interval before the start of the next level. It seems that they just needed a more precise thinning parameter, which could mean the signal while it was leveled out after 10 minutes, needed more stringent thinning params or a different thinning interval to successfully thin the data. @AHuffmyer I saw you went from a thinning parameter of 20 to 10, we could try going down by 1 from 20 to see the best thinning parameter so it doesn't differ from the current param setting too much? (I will have to read up on the thin_par function in R a little bit more, don't know if this is a valid suggestion!)