nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
16 stars 12 forks source link

[Bug]: AssertionError: Number of download batches is greater than K. This should not be possible! #862

Open philipjyoon opened 4 months ago

philipjyoon commented 4 months ago

Checked for duplicates

Yes - I've already checked

Describe the bug

Getting the following assertion error when running query below AssertionError: Number of download batches is greater than K. This should not be possible!

For frame 12646 download batch length is 3 while K is 2. This should indeed not be possible because we process on frame at a time and we get as many batches (basically one per acq time cycle) as K requested

What did you expect?

Work without error

Reproducible steps

python daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --start-date=2023-12-15T08:17:50Z --chunk-size=2 --k=2 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=forward --end-date=2024-01-15T08:35:59Z

Environment

`develop` branch circa June 4, 2024
philipjyoon commented 4 months ago

Actually we should never be running a forward query using those large date ranges. The nominal date range is just one hour and there's no use case for using anything larger.

For this large date range use-case, we should be using reprocessing mode. However, we still get the same error so this bug is still applicable.

philipjyoon commented 4 months ago

This happens for only cases where you run a date-based query (forward or reprocessing) with date range larger than one acquisition cycle which is 6 or 12 days. In forward processing, it would never make sense to run a query over 6 days. In reprocessing, I don't see any good reason to do this either. We would generally break down one very large queries into multiple smaller queries, each query covering hours.

Will have to ask the team if we actually have a use-case for running a date-based query for over 6 days.

philipjyoon commented 4 months ago

Thinking more about this... highly improbable but it's possible for an hour forward query to contain multiple acquisition cycle worth of data for the same frame. Production of the earlier data could have been delayed exactly by 6-12 days (within 1 hr precision) So we do need to handle this case