joefutrelle / pyifcb

IFCB data system, generation 2
MIT License
7 stars 6 forks source link

ml_analyzed error for bad last line of adc files #57

Open hsosik opened 4 years ago

hsosik commented 4 years ago

The MATLAB and Python versions of the ml_analyzed calculation produce an error in some cases where no inhibit time is available from hdr file AND last line of adc file is bad. Previously we had a special case for ml < 0 (from last adc line), but some cases instead produce ml >> 5 ml (not realistic). I think the better criteria may be to compare the 2nd and 23rd entries on the last line to see if they are more different than a few 10s of milliseconds (the normal diff). In the matlab script just commited I've replaced: if ml_analyzed(count) <= 0 with if abs(adc.Var23(end)-adc.Var2(end)) > 0.1

I haven't fully tested this, but it works for bin D20180829T144312_IFCB125, which previously gave ml_analyzed = 32 ml, and now gives 3.7348 ml.

joefutrelle commented 4 years ago

For my reference, here is the relevant part of the Python code and the corresponding part of the MATLAB code

The equivalent to those columns in the v2 schema are ADC_TIME (Var2) and RUN_TIME (Var23)

joefutrelle commented 4 years ago

This is implemented and generates the same result for that bin. I can go ahead and push it now, or if you want to do further testing I can wait for that.

hsosik commented 4 years ago

I did some more testing and found a need to update the criteria for detecting a bad last line in the adc file. The matlab script is now like this: if abs(adc.Var23(end)-adc.Var2(end)) > 0.3

https://github.com/hsosik/ifcb-analysis/blob/master/IFCB_tools/IFCB_volume_analyzed_fromADC.m

joefutrelle commented 4 years ago

I modified my solution accordingly and created a PR.

https://github.com/joefutrelle/pyifcb/pull/59

joefutrelle commented 4 years ago

@hsosik are we ready to go on this? If so I will merge the PR and deploy.

hsosik commented 4 years ago

In the matlab implementation, I am now handling another (rare) special case where multiple lines at the end of the adc file are bad (zero run and inhibit times). [I had a previous commit into ifcb-analysis, that had an incorrect / incomplete implementation of this solution--now I committed what I think is a working version of IFCB_volume_analyzed_fromADC.m]

Can you add this case to the python implementation? In matlab, I'm doing this: %minor case files with 0 runtime and inhibit time in numerous rows at file end if ml_analyzed(count) <= 0 runtime = adc.Var2(end-1); %next best info after runtime ii = find(adc.Var23); modeinhibittime = mode(diff(adc.Var24(ii))); inhibittime = adc.Var24(ii(end)) + (size(adc,1)-length(ii)) modeinhibittime; looktime = runtime - inhibittime; ml_analyzed(count) = flowrate.looktime/60; end

hsosik commented 4 years ago

By the way, here is a bin to test the new special case: \sosiknas1\IFCB_data\NESLTER_transect\data\2019\D20190205\D20190205T122609_IFCB127.adc

joefutrelle commented 4 years ago

What is the correct result for that bin? (so I can check my output)

joefutrelle commented 4 years ago

Can you briefly describe the algorithm? It looks quite different from the other cases in that it's computing some statistics over the whole file.

hsosik commented 4 years ago

What is the correct result for that bin? (so I can check my output)

Correct is a strong word for this case....but my answer is 1.5722 millilters for that bin.

hsosik commented 4 years ago

Can you briefly describe the algorithm? It looks quite different from the other cases in that it's computing some statistics over the whole file.

Yes, that's correct. It's like the previous case in getting run time from the end of column 2 of adc, but the inhibittime column has a lot of bad (zero) values in multiple rows at the bottom of the file. So, I'm estimating total inhibittime by taking the last good value (from col 24) and adding an estimate for the rows after that. I assume each row with missing inhibittime has a value equal to the mode of the measured inhibit times in col 24 (i.e., the "typical" dead time to handle a trigger).

joefutrelle commented 4 years ago

implemented in #61

hsosik commented 2 years ago

Is the python implementation of this already pushed and included with the solution for #70?