Open tpering opened 2 months ago
I think this is actually related to a lag in processing data due to the lack of some DOAS spectra.
I've tried RTP over an extended period where some spectra aren't acquired (because of the integration/coadd time of spectra being longer than the 5 second acquisition repeat), in other words, som spectra are dropped. It seems that the processing code hangs for quite a while whilst it waits for a DOAS data point to arrive that is within a certain time of an image pair. This hangtime is quite long, which means that if any spectra aren't acquired the processing code can quite quickly fall massively behind the data transfer speed. Watching in realt-time, the code hangs before this line:
No DOAS data point within 2s of image time: 14:14:45. Image is not added to DOAS calibration
It will quickly speed through the next data points where spectra are present, then hang again before the next respective DOAS missing line.
I'm attaching a long log of RTP where you can see how quickly RTP gets behind, then when I stop image transfer the RTP lines slowly start catching up. This issue can presumably be replicated by deleting some spectra in a folder and trying RTP on it. RTP_log.txt
Lines 2439-2448
in so2_camera_processor.py
try:
timeout = datetime.datetime.now() + datetime.timedelta(seconds = 30)
# Keep retrying to get the cd for current time until timeout
while (datetime.datetime.now() < timeout):
# Get CD for current time
cd = self.doas_worker.results.get(img_time)
if cd is not None: break
time.sleep(0.5)
else:
raise KeyError(f"spectra for {img_time} not found")
I think the issue is in this block of code. This is waiting for a spectrum but if that spectrum doesn't ever arrive 30 seconds is quite a long time to wait (in some situations, e.g. merapi, we might drop spectra quite frequently, so delaying processing for 30 seconds each time this happens is too much).
I think there's two (not mutually exclusive) solutions to this.
cd
is None
within the while-loop, to see if there are any later DOAS data points in doas_worker.results
. i.e. extract the last line of that dataframe and if it is later than img_time
we should safely assume that no data point for img_time
is going to arrive - I think we're safe to assume that data transfer is chronological as I believe the FTP worker sorts data by filename before transfering. If doas_worker.results
has later data, we can break from this while-loop and just continue with processing under the understanding that we won't have a DOAS data point for this image pair (i.e. can raise the KeyError exception in the same way as the else clause does.I think point 2 would be more urgent and better solve the problem, point 1 is probably more of a nice-to-have.
30 seconds was just something I picked out of a hat, so happy to lower it if that'll help. I suppose my question for 2 is can we assume chronological order for dropbox-type transfers too? If yes, then I think this is a nice and easy fix for the issue.
I suppose my question for 2 is can we assume chronological order for dropbox-type transfers too? If yes, then I think this is a nice and easy fix for the issue.
I'm not 100% on the answer to that, but I guess if things come in completely out of sync RTP will become incredibly messy and we'll probably struggle to get it working properly beyond just processing everything at the end of the day once all data is downloaded. So I think we have to assume some level consistency in timing of downloaded files, and certianly for direct FTP connections this is the case as I'm pretty sure the file list is sorted before pulling the next file (I haven't double-checked this but I think this is the case), and if it isn't it's definitely something that should be implemented.
30 seconds was just something I picked out of a hat, so happy to lower it if that'll help
I think the ideal solution for this is make it user-defined, in the GUI/config. For dropbox downloads at the moment there can be quite a lag, so 30 does seem reasonable, but for Merapi with their direct connection it just means the software is very quickly getting out of sync with the file transfer. I think I've changed it to 10 seconds for them.
2. Add a check each time
cd
isNone
within the while-loop, to see if there are any later DOAS data points indoas_worker.results
. i.e. extract the last line of that dataframe and if it is later thanimg_time
we should safely assume that no data point forimg_time
is going to arrive - I think we're safe to assume that data transfer is chronological as I believe the FTP worker sorts data by filename before transfering. Ifdoas_worker.results
has later data, we can break from this while-loop and just continue with processing under the understanding that we won't have a DOAS data point for this image pair (i.e. can raise the KeyError exception in the same way as the else clause does.
Just a note @ubdbra001 - we definitely need to implement this part of the fix. When post-processing data this timeout section of code significantly slows processing down if there are any missing spectra. Another option would be to force timeout=0
when post-processing, which would fit in with making timeout
configurable beyond just a local variable. But I still feel we should assume files arrive in order and implement the check for later spectra before the timeout section of code.
One scenario where setting the timeout to 0 for post-processing actually is quite important - if the last image pair of a sequence doesn't have a corresponding DOAS data point I'm guessing that currently the code will hang for 30 seconds as there won't be a later DOAS point to quickly exit the loop.
As below: