twVolc / PyCamPermanent

Permanent PiCam (SO2) installation project software
GNU General Public License v3.0
1 stars 2 forks source link

iFit worker and image processing out of sync, means that DOAS data points are not found #170

Open tpering opened 2 months ago

tpering commented 2 months ago

As below:

image

twVolc commented 1 month ago

I think this is actually related to a lag in processing data due to the lack of some DOAS spectra.

I've tried RTP over an extended period where some spectra aren't acquired (because of the integration/coadd time of spectra being longer than the 5 second acquisition repeat), in other words, som spectra are dropped. It seems that the processing code hangs for quite a while whilst it waits for a DOAS data point to arrive that is within a certain time of an image pair. This hangtime is quite long, which means that if any spectra aren't acquired the processing code can quite quickly fall massively behind the data transfer speed. Watching in realt-time, the code hangs before this line:

No DOAS data point within 2s of image time: 14:14:45. Image is not added to DOAS calibration

It will quickly speed through the next data points where spectra are present, then hang again before the next respective DOAS missing line.

I'm attaching a long log of RTP where you can see how quickly RTP gets behind, then when I stop image transfer the RTP lines slowly start catching up. This issue can presumably be replicated by deleting some spectra in a folder and trying RTP on it. RTP_log.txt

twVolc commented 1 month ago

Lines 2439-2448 in so2_camera_processor.py

           try:
                timeout = datetime.datetime.now() + datetime.timedelta(seconds = 30)
                # Keep retrying to get the cd for current time until timeout
                while (datetime.datetime.now() < timeout):
                    # Get CD for current time
                    cd = self.doas_worker.results.get(img_time)
                    if cd is not None: break
                    time.sleep(0.5)
                else:
                    raise KeyError(f"spectra for {img_time} not found")

I think the issue is in this block of code. This is waiting for a spectrum but if that spectrum doesn't ever arrive 30 seconds is quite a long time to wait (in some situations, e.g. merapi, we might drop spectra quite frequently, so delaying processing for 30 seconds each time this happens is too much).

I think there's two (not mutually exclusive) solutions to this.

  1. Make the timeout value configurable - in some telemetry setups 30 seconds might be reasonable whereas at other places where telemetry is fast this might be OTT.
  2. Add a check each time cd is None within the while-loop, to see if there are any later DOAS data points in doas_worker.results. i.e. extract the last line of that dataframe and if it is later than img_time we should safely assume that no data point for img_time is going to arrive - I think we're safe to assume that data transfer is chronological as I believe the FTP worker sorts data by filename before transfering. If doas_worker.results has later data, we can break from this while-loop and just continue with processing under the understanding that we won't have a DOAS data point for this image pair (i.e. can raise the KeyError exception in the same way as the else clause does.

I think point 2 would be more urgent and better solve the problem, point 1 is probably more of a nice-to-have.

ubdbra001 commented 1 month ago

30 seconds was just something I picked out of a hat, so happy to lower it if that'll help. I suppose my question for 2 is can we assume chronological order for dropbox-type transfers too? If yes, then I think this is a nice and easy fix for the issue.

twVolc commented 1 month ago

I suppose my question for 2 is can we assume chronological order for dropbox-type transfers too? If yes, then I think this is a nice and easy fix for the issue.

I'm not 100% on the answer to that, but I guess if things come in completely out of sync RTP will become incredibly messy and we'll probably struggle to get it working properly beyond just processing everything at the end of the day once all data is downloaded. So I think we have to assume some level consistency in timing of downloaded files, and certianly for direct FTP connections this is the case as I'm pretty sure the file list is sorted before pulling the next file (I haven't double-checked this but I think this is the case), and if it isn't it's definitely something that should be implemented.

twVolc commented 1 month ago

30 seconds was just something I picked out of a hat, so happy to lower it if that'll help

I think the ideal solution for this is make it user-defined, in the GUI/config. For dropbox downloads at the moment there can be quite a lag, so 30 does seem reasonable, but for Merapi with their direct connection it just means the software is very quickly getting out of sync with the file transfer. I think I've changed it to 10 seconds for them.

twVolc commented 1 month ago

2. Add a check each time cd is None within the while-loop, to see if there are any later DOAS data points in doas_worker.results. i.e. extract the last line of that dataframe and if it is later than img_time we should safely assume that no data point for img_time is going to arrive - I think we're safe to assume that data transfer is chronological as I believe the FTP worker sorts data by filename before transfering. If doas_worker.results has later data, we can break from this while-loop and just continue with processing under the understanding that we won't have a DOAS data point for this image pair (i.e. can raise the KeyError exception in the same way as the else clause does.

Just a note @ubdbra001 - we definitely need to implement this part of the fix. When post-processing data this timeout section of code significantly slows processing down if there are any missing spectra. Another option would be to force timeout=0 when post-processing, which would fit in with making timeout configurable beyond just a local variable. But I still feel we should assume files arrive in order and implement the check for later spectra before the timeout section of code.

twVolc commented 1 month ago

206 fixes point 2 discussed here. But as discussed in that PR, it might be nice to also make timeout a user-defined option in the future. I'll remove the priority label as I think the current fix is nice and should do a good job in most scenarios, but keep the issue open.

twVolc commented 1 month ago

One scenario where setting the timeout to 0 for post-processing actually is quite important - if the last image pair of a sequence doesn't have a corresponding DOAS data point I'm guessing that currently the code will hang for 30 seconds as there won't be a later DOAS point to quickly exit the loop.