nasa / opera-sds

Apache License 2.0
3 stars 1 forks source link

[Processing Request]: High Priority CalVal sites Request #55

Open gracebato opened 1 month ago

gracebato commented 1 month ago

Venue

PST

Product

DISP-S1

SAS Version

No response

SDS Version

No response

Input Data

High Priority Frames:

Share Results

Additional Notes

F11116 and F08882 were already processed in: https://github.com/nasa/opera-sds/issues/53

gracebato commented 1 month ago

Parameters should be similar to https://github.com/nasa/opera-sds/issues/53, e.g.

Date range: 20160701 - 20240905
k=15
m=5
philipjyoon commented 1 month ago

@gracebato just to confirm: You'd like these products delivered to ASF UAT correct? #53 did not request that.

Also does it make a difference if we process these on our INT venue instead of PST? The difference is that if we process on PST we will keep the products in PST S3 forever; wherever INT S3 will be deleted clean every deployment. So the question is: do these products need to be archived in PST S3 forever or just delivering to ASF UAT sufficient?

EDIT: After speaking with @LucaCinquini we decided that we will process on the PST venue and deliver to ASF UAT.

philipjyoon commented 1 month ago

We want to make sure that all frames process at least 4 years worth of data. So I'll process 2016-2021 covering 5 years. And if any of the frame still don't have 4 years worth of data, which is possible, I can extend the time range of those specific frames until that point.

This is a bit of manual work but not too bad. I can predetermine which frames would not have 4 years worth of data in the first 5 calendar years using the historical database.

gracebato commented 1 month ago

Hi @philipjyoon all DISP-S1 products goes to UAT for all produced going forward. So request https://github.com/nasa/opera-sds/issues/53 would also go to UAT. Thanks.

philipjyoon commented 1 month ago

This request will be executed in 2 variations with one dependency. @gracebato please correct me if the understanding is incorrect:

  1. Variation 1: Using OPERA PCM version 3.1.0-rc.6.0 (latest version as of today) process these frames for at least 4 years. Product version is still v0.6
  2. Dependency: After OPERA PCM version 3.1.0-rc.7.0 is released next week, first run the 3 frames in request #53 using product version v0.7 for 2016-2024
  3. Once above dependency is satisfactory, run these frames using the same software and same product version for the entire historical period 2016-2024.
philipjyoon commented 1 month ago

Will use the following batch_proc for Variation 1

{
 "enabled": true,
 "label": "PST_Request_55",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622, 9156, 12640, 18903, 28486, 33039, 33065, 36542, 42779],
 "wait_between_acq_cycles_mins": 10,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }
philipjyoon commented 1 month ago

~65% complete as of now.

f28486 has finished. f33039 is taking by far the longest... it's currently only 35% complete processing around 2018 right now.

philipjyoon commented 1 month ago

80% complete. The rate is about 1% per hour

philipjyoon commented 1 month ago

86% complete. There was some sort of JPL-wide network issue between last night and this morning. It seems to have just resolved and we've resumed processing.

frame_completion_percentages    ['33039: 62%', '9156: 82%', '8622: 94%', '28486: 100%', '36542: 90%', '18903: 88%', '33065: 89%', '12640: 99%', '42779: 87%']
last_processed_datetimes        {'33039': '2019-08-07T04:30:10', '9156': '2020-08-01T02:07:32', '8622': '2020-11-04T22:51:11', '28486': '2021-01-21T00:36:31', '36542': '2020-10-07T01:59:19', '18903': '2020-09-26T13:51:31', '33065': '2020-10-12T04:39:58', '12640': '2021-01-04T23:28:50', '42779': '2020-09-26T16:13:06'}
progress_percentage             86%
philipjyoon commented 1 month ago

Logging into the SCIFLO verdi machines and manually killing cloudwatch agent service which uses up one whole CPU core. This frees up the CPU core for the actual DISP-S1 processing. Next OPERA PCM release have a fix for this cloudwatch agent inefficiency.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop

philipjyoon commented 3 weeks ago

Processing is complete. Here is the listing of all products

request_55_product_paths.txt

philipjyoon commented 3 weeks ago

Due to an operator error getting around a database file issue (which is now fixed going forward) we need to reprocess the last 4 runs of frame 8622 starting w the query job that generated the following batch ids: f8622_a1032 f8622_a1020 f8622_a996 f8622_a984 f8622_a972 f8622_a960 f8622_a948 f8622_a936 f8622_a924 f8622_a912 f8622_a900 f8622_a888 f8622_a864 f8622_a852 f8622_a840

To do this, we will need to perform the following actions:

  1. Delete all compressed CSLC records from GRQ ES index grq_1_l2_cslc_s1_compressed that were generated by the SCIFLO runs that we are going to re-run.
  2. Create a DISP-S1 historical processing batch_proc with frame 8622. Also need to create the following field: "frame_states": {"8622": 60} and data_start_date can be the original start date in 2016. This way the processing will happen with sensing dates 61 to 75 in the frame 8622 series
  3. Purge all query, download, sciflo, cnm-s, and cnm-r jobs that correspond to those runs so that they don't get deduped
  4. Ask ASF to delete the old files
  5. Start the batch_proc and monitor... make sure the first product reference date is correct.
philipjyoon commented 3 weeks ago
{
 "enabled": true,
 "label": "PST_Request_55_partial_8622",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622],
 "frame_states": {"8622": 60},
 "wait_between_acq_cycles_mins": 5,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }
philipjyoon commented 3 weeks ago

Reprocessing of frame 8622 last 4 runs have started

philipjyoon commented 3 weeks ago

Had to stop and restart because I hadn't deleted the compressed CSLCs from those 4 incorrect runs.

We can use Tosca to delete unwanted Compressed CSLC records. In this case we want to delete all C-CSLC products that have the reference date 20181103T000000Z

  1. Create filter in Tosca Image
  2. Use that filter to run on-demand "Purge datasets" job Image
philipjyoon commented 3 weeks ago

Reprocessing of the last 4 runs of frame 8622 was successful. Below is the corrected listing of all products from this run.

request_55_product_paths_fixed_8622.txt

arothjpl commented 2 days ago

Processing started on 10-28. Still processing as of 10-30.