nasa / opera-sds

Apache License 2.0
3 stars 1 forks source link

[Processing Request]: Winter blackout dates test #58

Open gracebato opened 1 month ago

gracebato commented 1 month ago

Venue

PST

Product

DISP-S1

SAS Version

No response

SDS Version

PCM version 3.1.0-rc.6.0

Input Data

Process the following frames with the blackout-winter-dates database using PCM version 3.1.0-rc.6.0 :

Use similar config as https://github.com/nasa/opera-sds/issues/55#issuecomment-2400451423, i.e., :

 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,

Share Results

Additional Notes

No response

scottstanie commented 1 month ago

Here is the regenerated database, where I've removed the time periods for each frame that Grace identified as snowy. opera-disp-s1-consistent-burst-ids-2024-10-11-2016-07-01_to_2024-09-04-unnested.json

spot checking frame 835

$ jq '."835"' < opera-disp-s1-consistent-burst-ids-2024-10-11-2016-07-01_to_2024-09-04-unnested.json | grep 2021
    "2021-01-01T23:07:19",
    "2021-01-13T23:07:19",
    "2021-01-25T23:07:18",
    "2021-03-02T23:07:17",
    "2021-03-14T23:07:18",
    "2021-03-26T23:07:18",

looks like it's skipping over Feb. 2021 correctly

scottstanie commented 1 month ago

opera-disp-s1-consistent-burst-ids-2024-10-11-2016-07-01_to_2024-09-04.json @philipjyoon Here is the version with {"metadata": {...}, "data": {...the 'unnested' json version}}

philipjyoon commented 1 month ago

Will be using this batch_proc

{
 "enabled": true,
 "label": "PST_Request_58",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622, 33065, 36542, 42779],
 "wait_between_acq_cycles_mins": 5,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }
philipjyoon commented 1 month ago

Processing has started.

philipjyoon commented 1 month ago

Thanks @scottstanie both file formats are good on our side too.

philipjyoon commented 1 month ago

Unfortunately the unnested database file contained sensing dates for HH polarization for frame 8622 and cause the query job to fail. It looks like this:

image

We can find the offending sensing datetime by running the following:

(mozart) hysdsops@opera-pst-mozart-fwd:~/pst_requests/request_58$ python ~/mozart/ops/opera-pcm/tools/disp_s1_burst_db_tool.py validate 8622
...
Acquisition cycle 972 of sensing time 2019-03-15 22:50:55 is good 
Acquisition cycle 984 of sensing time 2019-03-27 22:50:55 is good 
Acquisition cycle 996 of sensing time 2019-04-08 22:50:55 is good 
Acquisition cycle 1008 is missing 27 bursts:  {'T033-068977-IW3', 'T033-068974-IW1', 'T033-068974-IW2', 'T033-068971-IW3', 'T033-068976-IW2', 'T033-068977-IW1', 'T033-068976-IW1', 'T033-068977-IW2', 'T033-068969-IW2', 'T033-068975-IW1', 'T033-068975-IW2', 'T033-068975-IW3', 'T033-068969-IW3', 'T033-068973-IW3', 'T033-068970-IW3', 'T033-068974-IW3', 'T033-068972-IW2', 'T033-068972-IW3', 'T033-068973-IW1', 'T033-068971-IW1', 'T033-068970-IW1', 'T033-068971-IW2', 'T033-068969-IW1', 'T033-068973-IW2', 'T033-068972-IW1', 'T033-068970-IW2', 'T033-068976-IW3'}
Granules for acquisition cycle 1008 found: []
Acquisition cycle 1020 of sensing time 2019-05-02 22:50:56 is good 
Acquisition cycle 1032 of sensing time 2019-05-14 22:50:57 is good 
Acquisition cycle 1044 of sensing time 2019-05-26 22:50:58 is good 

We can see that indeed for frame 8622 (one of whose burst id is T033-068977-IW3 consists of HH polarization. image

So there is mismatch between what the database file is saying and what's in the CMR. We should be ignoring all HH data and the database file does not in this case. Therefore, we need to fix the database file by hand and then upload it to S3 for the query jobs to continue. There must be a bug in the code that generated the database file - once we fix that we shouldn't have this issue.

To do this, we take out the offending sensing datetime which is 2019-04-20 and then use this new file to over-write the file that's in S3 image

Ideally we should rename this file to something else but we're still refining our process and development and also sensitive to processing time right now. We will name the file the same so that we don't have to redeploy the settings.yaml file which takes ~20 mins but, more importantly, forces all existing jobs to restart and waste progress.

image

Before and after this change to the historical database file: (note the last sensing datetime)

python ~/mozart/ops/opera-pcm/tools/disp_s1_burst_db_tool.py frame 8622 --k=15
...
K-cycle 4 ['2018-11-03T22:50:58',...'2019-05-02T22:50:56']
...
...
python ~/mozart/ops/opera-pcm/tools/disp_s1_burst_db_tool.py frame 8622 --k=15
...
K-cycle 4 ['2018-11-03T22:50:58', '...', '2019-05-14T22:50:57']

Finally, we have to submit a new daac_data_subscriber command to replace the failed job. We need to use a new end-date. After this, the historical processor will do the right thing going forward. python data_subscriber/daac_data_subscriber.py query --collection-shortname=OPERA_L2_CSLC-S1_V1 --endpoint=OPS --start-date=2018-11-02T04:00:32Z --end-date=2019-05-15T05:00:35Z --release-version=3.1.0-rc.6.0 --job-queue=opera-job_worker-cslc_data_download_hist --chunk-size=1 --k=15 --m=6 --use-temporal --max-revision=1000 --processing-mode=historical --frame-id=8622 --transfer-protocol=auto

philipjyoon commented 1 month ago

I forgot to perform one last step which is to restart run_disp_s1_historical_processing.py That's because this application loads the database at the beginning of the execution and just uses it. So restarting it forces it to re-load the newly modified file.

philipjyoon commented 1 month ago

@gracebato requested an extended run just for frame 42279. Because we've already run this frame partially we need to create a new batch_proc with just that frame and, more importantly, transfer over the frame_state so that we don't submit a previous job again.

{
 "enabled": true,
 "label": "PST_Request_58_42279",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2024-01-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [42779],
 "frame_states": {"42779": 15},
 "wait_between_acq_cycles_mins": 5,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }