nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
17 stars 12 forks source link

[Bug]: Nondeterminism in RTC Download via Data Subscriber #877

Open niarenaw opened 2 weeks ago

niarenaw commented 2 weeks ago

Checked for duplicates

Yes - I've already checked

Describe the bug

In adressing https://github.com/nasa/opera-sds/issues/48, some unexpected behavior was observed using the data subscriber script. Several thousand RTC products were generated then delivered to ASF UAT. The goal was to generate DSWx-S1 products for the tile/date pairs listed in ticket above. The following commands were run (in the listed order):

FIRST COMMAND: Trigger all RTC downloads given delivery time range

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
        --start-date=$START \
        --end-date=$END \
        --processing-mode=historical \

SECOND COMMAND: Trigger all RTC downloads given delivery time range + tile

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
        --start-date=$START \
        --end-date=$END \
        --include-regions=$TILE \
        --processing-mode=historical \

THIRD COMMAND: Trigger all RTC downloads given temporal time range + tile

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
        --start-date=$START \
        --end-date=$END \
        --include-regions=$TILE \
        --use-temporal \
        --processing-mode=historical \

Despite running the following for many tile/date combos, certain products that we expected to make were never generated. One such pair is: (17RNH, 2022-11-10). I would also expect each command to cause all submitted jobs to dedupe as each triggers a subset of the previous, but new jobs were always kicked off. After all jobs were run, the only way to trigger certain missing products was to run the data subscriber with in a loop with --native-id=RTC for each RTC that was derived from the SLC that covered the MGRS tile on the missing date.

What did you expect?

  1. Would expect the first command to trigger all RTC downloads with any coverage since --coverage-target=1
  2. Would expect all jobs to dedupe (or at least trigger missing products)
  3. Would expect all jobs to dedupe (or at least trigger missing products)

Reproducible steps

1. Generate and deliver a decent size dataset of RTC products
2. Run the commands above
3. Confirm certain products were not delivered

Environment

- Processing was done on PST using release 3.0.0-er.3.0
hhlee445 commented 2 weeks ago

@niarenaw can you also provide specific time range and geojson file?

niarenaw commented 2 weeks ago