nikki-t commented 1 month ago

UAT test (1.5.0) Test operations to mirror expected operations in OPS

Wipe out Hydrocron tables in services UAT environment.
Follow release process for UAT environment: ReleaseProcess
Enable track ingest EventBridge schedules. a. Modify schedule cron expressions:
- reach: 0 23 ? *
- node: 5 23 ? *
- prior lake: 10 23 ? *
Enable the sending of CNM messages from SWOT OPS to the services UAT account.
Monitor track ingest operations via CloudWatch Logs.
Track Ingest Lamba executes hourly to:
- Locate granules that are in CMR but not in Hydrocron.
- Locate granules that have been marked as "to_ingest" in the track ingest tables and counts the features to determine if they have been fully ingested.
- Send CNM messages to the CNM Lambda for any granules that have not been or have been partially ingested.
- Update the track ingest tables with new granule statuses.
- CloudWatch logs: /aws/lambda/svc-hydrocron-sit-track-ingest-lambda
CNM Lambda function executes upon receipt of a new CNM message:
- Executes the Load Data Granule with the required input data: granule_path, table_name, track_table, checksum, revisionDate, load_benchmarking_data
- CloudWatch logs: /aws/lambda/svc-hydrocron-uat-cnm-lambda
The Granule Lambda function executes (triggered from Track Ingest or CNM Lambda) to:
- Add granule to track ingest table with expected feature count and a status of "to_ingest".
- Read in granule shapefile data and add it to the appropriate Hydrocron SWOT table.
- CloudWatch logs: /aws/lambda/svc-hydrocron-uat-load_granule-lambda

nikki-t commented 4 weeks ago

Since it looks like it might be too difficult to enable forward stream in our UAT environment it will be helpful to test the track ingest operations on a "batch" of granules. Track ingest is meant to work on batches of granules which it detects by querying CMR for a range of revision_dates on a hourly, daily, or weekly basis. So what number of granules would stress test track ingest without causing timeouts (as it is a Lambda function)? What is a good test of these types of operations?

I don’t know that is necessary to test all collections, but it might be good to test Rivers and Lakes. So maybe we can do a larger test on prior lakes and a smaller test on reaches?

How often to granules come in? If we wanted to run weekly how many could a CMR query potentially retrieve?

Number of reaches per week in Sep. 2024 taken by querying revision_date:

week_1 (2024-09-01 00:00:00 to 2024-09-07 23:59:59): 344
week_2 (2024-09-08 00:00:00 to 2024-09-14 23:59:59): 418
week_3 (2024-09-15 00:00:00 to 2024-09-21 23:59:59): 451
week_4 (2024-09-22 00:00:00 to 2024-09-28 23:59:59): 452
week_5 (2024-09-29 00:00:00 to 2024-10-05 23:59:59): 461
average: 425.2

Number of prior lakes per week in Sep. 2024 taken by querying revision_date:

week_1 (2024-09-01 00:00:00 to 2024-09-07 23:59:59): 332
week_2 (2024-09-08 00:00:00 to 2024-09-14 23:59:59): 405
week_3 (2024-09-15 00:00:00 to 2024-09-21 23:59:59): 447
week_4 (2024-09-22 00:00:00 to 2024-09-28 23:59:59): 453
week_5 (2024-09-29 00:00:00 to 2024-10-05 23:59:59): 448
average: 417

Currently loaded into UAT:

Reaches have 415 granules in CMR UAT. Temporal range: 2023-07-27 01:24:06 to 2024-09-30 22:02:10
Nodes have 319 granules in CMR UAT. Temporal range: 2023-07-03 17:43:15 to 2024-09-14 01:22:16
Prior Lakes have 731 granules in CMR UAT. Temporal range: 2023-08-30 23:49:01 to 2024-08-25 23:28:12

So maybe we can try to test what would happen if we ran on 2 weeks of lake data where we first submit 1 week and then submit an additional week after the first has completed? This way we can use what is already loaded and then we can regroup and decide if we want to try running on a week's worth of reaches.

The only thing that is not tested is the continual running and querying by revision date.

Additionally, we may want to load in a few with overlapping CRIDs to test that functionality.

@torimcd and @ymchenjpl - what do you think?

nikki-t commented 3 weeks ago

UAT test steps

Execute Track Ingest Lambda function manually on prior lake data via the console with the following event JSON

{
  "collection_shortname": "SWOT_L2_HR_LakeSP_prior_2.0",
  "hydrocron_table": "hydrocron-swot-prior-lake-table",
  "hydrocron_track_table": "hydrocron-swot-prior-lake-track-ingest-table",
  "temporal": "",
  "query_start": "2024-08-17T00:00:00",
  "query_end": "2024-08-25T23:59:59",
  "reprocessed_crid": "PGC0"
}

a. Confirm the Track Ingest Lambda retrieves 444 granules and ingests granules that were not successfully ingested into Hydrocron.

Execute Track Ingest Lambda function manually on reach data via the console with the following event JSON

{
  "collection_shortname": "SWOT_L2_HR_RiverSP_reach_2.0",
  "hydrocron_table": "hydrocron-swot-reach-table",
  "hydrocron_track_table": "hydrocron-swot-reach-track-ingest-table",
  "temporal": "",
  "query_start": "2023-07-27T00:00:00",
  "query_end": "2024-10-30T23:59:59",
  "reprocessed_crid": "PGC0"
}

a. Confirm the Track Ingest Lambda retrieves 415 granules and ingests granules that were not successfully ingested into Hydrocron. b. Confirm that the following PGC0 were prioritized by the track ingest operations and if needed the granules are ingested into Hydrocron:

SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PIC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PGC0_01.zip
SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PIC0_01.zip

nikki-t commented 2 weeks ago

UAT test results

Result 1)

There are cases where there are two granules with the exact same cycle, pass, continent, timestamp, and CRID but the product counter is different. This causes the load granule lambda to overwrite the granule data with which ever was passed to it last.

Example Granules:

SWOT_L2_HR_LakeSP_Prior_019_527_AF_20240818T142209_20240818T143834_PIC0_01.zip
SWOT_L2_HR_LakeSP_Prior_019_527_AF_20240818T142209_20240818T143834_PIC0_02.zip

Proposed solution - Modify track ingest operations so that it searches Hydrocron SWOT tables for product counters.

Result 2)

There were cases where the load granule lambda could not load all of the features present in the shapefile and would time out.

Proposed solution - Modify load granule lambda to increase timeout to 15 minutes and memory to 4096mb.

Result 3)

Track ingest gathers granules that have been inserted by the load granule Lambda and attempts to reconcile anything with a to_ingest status. This may cause a time out for larger database queries when we first run the track ingest operations in OPS as it will attempt to reconcile everything in the track ingest database.

Proposed solution - Enable batching so that track ingest retrieves 500 items at a time from the track ingest table and with each run it will work through the to_ingest statuses. Increase track ingest lambda time out to 15 minutes.

Result 4)

There are cases where the track ingest record does not have a checksum, which occurs from manually loading of data.

Proposed solution - Leave as is and plan to populate checksum if needed.

Modifications

Modify track ingest operations to query in batches of 500 for granules with a "to_ingest" status and increase timeout to 15 minutes.
Modify track ingest operations so they deal with the product counter.
Modify load granule lambda settings to increase timeout and memory.

nikki-t commented 1 week ago

Closing as UAT testing is complete for release 1.5.0.

podaac / hydrocron

Test full track ingest functionality in UAT #239