[Bug]: DISP-S1 reprocessing with date range doesn't retrieve k-granules sometimes. Also processes extra frame that it shouldn't be

philipjyoon commented 2 months ago

Checked for duplicates

Yes - I've already checked

Describe the bug

When running CSLC query (for DISP-S1 processing) in reprocessing mode, specifying a date range (as opposed to native_id) sometimes does not retrieve k-granules correctly. It submits download jobs with just one batch_id even when k is larger than 1. This seems to coincide with the query job submitting download for an adjacent frame which it shouldn't be submitting at all - but surprisingly with the correct k granules. These are probably related.

The issue was first found using the command "Originally" in the Reproducible Steps below. I was able to come up with a simpler test case.

However, not all reprocessing with date range with optional frame-id exhibit this bug. The following two commands work correctly:

python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --start-date=2024-06-05T02:00:00Z --end-date=2024-06-05T02:01:11Z
python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --start-date=2024-06-05T02:00:00Z --end-date=2024-06-05T02:01:11Z --frame-id=34481

What may be happening is that when one random granule is picked out of a date range query to expand upon to full-frame for reprocessing, if it happens to be one of the 6 (out of 27 total) bursts that belong to more than one frame, this bug surfaces. And there's probably logic in the code somewhere, correctly, that for a single native_id reprocessing request, there should only be one frame download job. And, actually, this can be easily tested by running reprocessing using the burst native_id chosen to represent the date range reprocessing case.

What did you expect?

n/a

Reproducible steps

Originally:
python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1  --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --start-date=2024-06-05T02:00:00Z --end-date=2024-06-05T02:30:00Z

Simplified:
python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1  --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --start-date=2024-06-05T02:00:00Z --end-date=2024-06-05T02:30:11Z --frame-id=34996

Environment

- Version of this software [e.g. vX.Y.Z]
- Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
...

philipjyoon commented 2 months ago

Both of the native-id reprocessing work correctly. But I still think some confluence of these native-ids with date range reprocessing is causing this bug

python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --native-id=OPERA_L2_CSLC-S1_T131-279961-IW1_20240224T163057Z_20240605T012555Z_S1A_VV_v1.1

python ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query -c OPERA_L2_CSLC-S1_V1 --chunk-size=1 --k=4 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=reprocessing --native-id=OPERA_L2_CSLC-S1_T131-279969-IW1_20240224T163119Z_20240605T012546Z_S1A_VV_v1.1

philipjyoon commented 1 month ago

This bug was fixed as part of addressing feature #1001

nasa / opera-sds-pcm