twVolc / PyCamPermanent

Permanent PiCam (SO2) installation project software
GNU General Public License v3.0
1 stars 2 forks source link

Consider missing data for RTP #166

Open twVolc opened 1 week ago

twVolc commented 1 week ago

As discussed, one possible scenario it would be important to consider during the RTP procedure is if we miss a chunk of data. For instance, for whatever reason it is possible the camera might stop acquiring/uploading data properly for a period of time, then start again.

If more time has passed than the length of time used for a calibration (e.g. typically about 45 minutes defined by the 'Remove DOAS points after' parameter), all of the DOAS points will need to be removed, and the software will need to slowly rebuild a calibration line, only going back to calibrate once it again has data spanning a sufficient time period. Hopefully this makes sense? I think the software should already be setup to remove all DOAS data points that exceed a time threshold, but whether this work correctly after a long jump in time would need checking. It probably isn't setup to deal with the fact that in this case you would need to slowly build a calibration line from scratch again (i.e. it can't calibrate using just the first point that arrive after a large gap in data, it needs that time to rebuild a calibration line). I guess this would work in pretty much exactly the same way as starting a new day.

ubdbra001 commented 1 week ago

Is this the same as resetting after a day change? Or slightly different?

twVolc commented 1 week ago

It is essentially the same in the case I describe above, but there may be more subtle variations where it's more complicated. E.g. if we have a gap of 30 minutes, and our calibration is set to remove data points after every 45 minutes, we would still have 15 minutes of data from before the gap - would this then be used to form a calibration line for the new data? It probably shouldn't be as it wouldn't have many data points to form the regression.

You can then imagine all other permutations between having continuous data and having large data gaps. I'm not entirely clear how best to deal with this in my own head, so I'm probably just thinking out loud here and over complicating things. I wonder if we just need another parameter - minimum number of data points - to decide whether we can form a calibration or not. e.g. we might set this to 200, so if we don't have that many data points within the time-frame defined for calibration (e.g. 45 minutes) we don't calibrate and wait until we do get enough data.

I think in reality this is probably over-complicating things and might cause its own issues, so an alternative might be simply to save how many data points were involved in the calibration fit as part of the calibration file for each specific time-step. A user can then go and inspect this if they're worried about specific times of data. Beyond this, we could simply set the software to attempt to calibrate as long as there isn't a gap in data that exceeds the time outlined by 'remove DOAS data after'. If the gap exceeds that length of time then simply reset the calibration as if it's starting a new day.

I think there's more discussion to be had on this issue to be honest.