Closed nikki-t closed 1 week ago
Since it looks like it might be too difficult to enable forward stream in our UAT environment it will be helpful to test the track ingest operations on a "batch" of granules. Track ingest is meant to work on batches of granules which it detects by querying CMR for a range of revision_dates on a hourly, daily, or weekly basis. So what number of granules would stress test track ingest without causing timeouts (as it is a Lambda function)? What is a good test of these types of operations?
I don’t know that is necessary to test all collections, but it might be good to test Rivers and Lakes. So maybe we can do a larger test on prior lakes and a smaller test on reaches?
How often to granules come in? If we wanted to run weekly how many could a CMR query potentially retrieve?
Number of reaches per week in Sep. 2024 taken by querying revision_date
:
Number of prior lakes per week in Sep. 2024 taken by querying revision_date
:
Currently loaded into UAT:
So maybe we can try to test what would happen if we ran on 2 weeks of lake data where we first submit 1 week and then submit an additional week after the first has completed? This way we can use what is already loaded and then we can regroup and decide if we want to try running on a week's worth of reaches.
The only thing that is not tested is the continual running and querying by revision date.
Additionally, we may want to load in a few with overlapping CRIDs to test that functionality.
@torimcd and @ymchenjpl - what do you think?
Execute Track Ingest Lambda function manually on prior lake data via the console with the following event JSON
{
"collection_shortname": "SWOT_L2_HR_LakeSP_prior_2.0",
"hydrocron_table": "hydrocron-swot-prior-lake-table",
"hydrocron_track_table": "hydrocron-swot-prior-lake-track-ingest-table",
"temporal": "",
"query_start": "2024-08-17T00:00:00",
"query_end": "2024-08-25T23:59:59",
"reprocessed_crid": "PGC0"
}
a. Confirm the Track Ingest Lambda retrieves 444 granules and ingests granules that were not successfully ingested into Hydrocron.
Execute Track Ingest Lambda function manually on reach data via the console with the following event JSON
{
"collection_shortname": "SWOT_L2_HR_RiverSP_reach_2.0",
"hydrocron_table": "hydrocron-swot-reach-table",
"hydrocron_track_table": "hydrocron-swot-reach-track-ingest-table",
"temporal": "",
"query_start": "2023-07-27T00:00:00",
"query_end": "2024-10-30T23:59:59",
"reprocessed_crid": "PGC0"
}
a. Confirm the Track Ingest Lambda retrieves 415 granules and ingests granules that were not successfully ingested into Hydrocron.
b. Confirm that the following PGC0
were prioritized by the track ingest operations and if needed the granules are ingested into Hydrocron:
SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PIC0_01.zip
There are cases where there are two granules with the exact same cycle, pass, continent, timestamp, and CRID but the product counter is different. This causes the load granule lambda to overwrite the granule data with which ever was passed to it last.
Example Granules:
Proposed solution - Modify track ingest operations so that it searches Hydrocron SWOT tables for product counters.
There were cases where the load granule lambda could not load all of the features present in the shapefile and would time out.
Proposed solution - Modify load granule lambda to increase timeout to 15 minutes and memory to 4096mb.
Track ingest gathers granules that have been inserted by the load granule Lambda and attempts to reconcile anything with a to_ingest
status. This may cause a time out for larger database queries when we first run the track ingest operations in OPS as it will attempt to reconcile everything in the track ingest database.
Proposed solution - Enable batching so that track ingest retrieves 500 items at a time from the track ingest table and with each run it will work through the to_ingest
statuses. Increase track ingest lambda time out to 15 minutes.
There are cases where the track ingest record does not have a checksum, which occurs from manually loading of data.
Proposed solution - Leave as is and plan to populate checksum if needed.
Closing as UAT testing is complete for release 1.5.0.
UAT test (1.5.0) Test operations to mirror expected operations in OPS