podaac / hydrocron

API for retreiving time series of SWOT data
https://podaac.github.io/hydrocron/
Apache License 2.0
17 stars 4 forks source link

Test full track ingest functionality in UAT #239

Open nikki-t opened 1 month ago

nikki-t commented 1 month ago

UAT test (1.5.0) Test operations to mirror expected operations in OPS

  1. Wipe out Hydrocron tables in services UAT environment.
  2. Follow release process for UAT environment: ReleaseProcess
  3. Enable track ingest EventBridge schedules. a. Modify schedule cron expressions:
    • reach: 0 23 ? *
    • node: 5 23 ? *
    • prior lake: 10 23 ? *
  4. Enable the sending of CNM messages from SWOT OPS to the services UAT account.
  5. Monitor track ingest operations via CloudWatch Logs.
  6. Track Ingest Lamba executes hourly to:
    • Locate granules that are in CMR but not in Hydrocron.
    • Locate granules that have been marked as "to_ingest" in the track ingest tables and counts the features to determine if they have been fully ingested.
    • Send CNM messages to the CNM Lambda for any granules that have not been or have been partially ingested.
    • Update the track ingest tables with new granule statuses.
    • CloudWatch logs: /aws/lambda/svc-hydrocron-sit-track-ingest-lambda
  7. CNM Lambda function executes upon receipt of a new CNM message:
    • Executes the Load Data Granule with the required input data: granule_path, table_name, track_table, checksum, revisionDate, load_benchmarking_data
    • CloudWatch logs: /aws/lambda/svc-hydrocron-uat-cnm-lambda
  8. The Granule Lambda function executes (triggered from Track Ingest or CNM Lambda) to:
    • Add granule to track ingest table with expected feature count and a status of "to_ingest".
    • Read in granule shapefile data and add it to the appropriate Hydrocron SWOT table.
    • CloudWatch logs: /aws/lambda/svc-hydrocron-uat-load_granule-lambda
nikki-t commented 1 week ago

Since it looks like it might be too difficult to enable forward stream in our UAT environment it will be helpful to test the track ingest operations on a "batch" of granules. Track ingest is meant to work on batches of granules which it detects by querying CMR for a range of revision_dates on a hourly, daily, or weekly basis. So what number of granules would stress test track ingest without causing timeouts (as it is a Lambda function)? What is a good test of these types of operations?

I don’t know that is necessary to test all collections, but it might be good to test Rivers and Lakes. So maybe we can do a larger test on prior lakes and a smaller test on reaches?

How often to granules come in? If we wanted to run weekly how many could a CMR query potentially retrieve?

Number of reaches per week in Sep. 2024 taken by querying revision_date:

Number of prior lakes per week in Sep. 2024 taken by querying revision_date:

Currently loaded into UAT:

So maybe we can try to test what would happen if we ran on 2 weeks of lake data where we first submit 1 week and then submit an additional week after the first has completed? This way we can use what is already loaded and then we can regroup and decide if we want to try running on a week's worth of reaches.

The only thing that is not tested is the continual running and querying by revision date.

Additionally, we may want to load in a few with overlapping CRIDs to test that functionality.

@torimcd and @ymchenjpl - what do you think?

nikki-t commented 5 days ago

UAT test steps

  1. Execute Track Ingest Lambda function manually on prior lake data via the console with the following event JSON

    {
      "collection_shortname": "SWOT_L2_HR_LakeSP_prior_2.0",
      "hydrocron_table": "hydrocron-swot-prior-lake-table",
      "hydrocron_track_table": "hydrocron-swot-prior-lake-track-ingest-table",
      "temporal": "",
      "query_start": "2024-08-17T00:00:00",
      "query_end": "2024-08-25T23:59:59",
      "reprocessed_crid": "PGC0"
    }

    a. Confirm the Track Ingest Lambda retrieves 444 granules and ingests granules that were not successfully ingested into Hydrocron.

  2. Execute Track Ingest Lambda function manually on reach data via the console with the following event JSON

    {
      "collection_shortname": "SWOT_L2_HR_RiverSP_prior_2.0",
      "hydrocron_table": "hydrocron-swot-reach-table",
      "hydrocron_track_table": "hydrocron-swot-reach-track-ingest-table",
      "temporal": "",
      "query_start": "2023-07-27T00:00:00",
      "query_end": "2024-10-30T23:59:59",
      "reprocessed_crid": "PGC0"
    }

    a. Confirm the Track Ingest Lambda retrieves 415 granules and ingests granules that were not successfully ingested into Hydrocron. b. Confirm that the following PGC0 were prioritized by the track ingest operations and if needed the granules are ingested into Hydrocron:

    SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PGC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_584_SI_20240124T233242_20240124T233247_PIC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PGC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_566_AS_20240124T081454_20240124T081505_PIC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PGC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_564_SI_20240124T062419_20240124T062430_PIC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PGC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_536_SI_20240123T062338_20240123T062349_PIC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PGC0_01.zip
    SWOT_L2_HR_RiverSP_Reach_009_500_SI_20240121T233106_20240121T233110_PIC0_01.zip