nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
16 stars 12 forks source link

PGE Smoke Test On-Demand Job #885

Closed collinss-jpl closed 2 months ago

collinss-jpl commented 3 months ago

Purpose

Screenshot 2024-06-20 at 10 19 09 AM Screenshot 2024-06-20 at 10 23 37 AM

Hopefully this new job can help streamline the smoke test portion of an SDS deployment, as all expected inputs/outputs should already be available from PGE development. If there are any suggestions for additional features or improvements, please provide them in this PR.

Issues

N/A

Testing

philipjyoon commented 3 months ago

@collinss-jpl This looks great and is in the right direction. However, I think it's missing a piece of the PCM logic that the existing smoke test exercises: downloading of the input and ancillary files. This test seems to copy those over from an existing zip file. That is certainly valid if the goal is to test the PGE functionality only. But I believe our goal is to test the entire PCM, including its logic for retrieving necessary ancillaries, in exercising the PGE.

The smoke testing we've been running submits download jobs into PCM and let it run as it would normally in determining and retrieving all dependent files. While 99% of all our smoke tests have been successfully, there was one instance where it failed; and the failure reason was that the ancillary retrieval logic was incorrect. If we just ran PGE with all dependencies coming in as zipfiles, we would not have found that issue.

collinss-jpl commented 3 months ago

@collinss-jpl This looks great and is in the right direction. However, I think it's missing a piece of the PCM logic that the existing smoke test exercises: downloading of the input and ancillary files. This test seems to copy those over from an existing zip file. That is certainly valid if the goal is to test the PGE functionality only. But I believe our goal is to test the entire PCM, including its logic for retrieving necessary ancillaries, in exercising the PGE.

The smoke testing we've been running submits download jobs into PCM and let it run as it would normally in determining and retrieving all dependent files. While 99% of all our smoke tests have been successfully, there was one instance where it failed; and the failure reason was that the ancillary retrieval logic was incorrect. If we just ran PGE with all dependencies coming in as zipfiles, we would not have found that issue.

I totally agree here. This job should not be considered a drop in replacement for testing the download/ancillary staging jobs. It's main purpose was to decouple the output product comparison part of a smoke test from testing of a "full" PGE workflow (including product download). In other words, running this job should be sufficient to convince ourselves that PGE integration into PCM has not changed the content of the output products vs. running the PGE standalone.

hhlee445 commented 2 months ago

@collinss-jpl DISP-S1 smoke test failed

http://opera-dev-triage-fwd-hyunlee.s3-website-us-west-2.amazonaws.com/triaged_job-pge_smoke_test__pge_int_test_job-single_submission-20240709T153343.400329Z_task-d943a72a-02f7-49fb-a964-d64a335c0720

collinss-jpl commented 2 months ago

@hhlee445 I believe I have resolved the failure you encountered for the DISP-S1 smoke test. I have also added a choice of auto-scaling queues that can be used with the job, one that uses intel-based instances and another that uses AMD:

Screenshot 2024-07-11 at 11 15 37 AM

This is needed since the smoke test will only pass if the correct chipset is used to generate the output products. Currently the correct mapping is:

DSWx-S1: Intel DISP-S1: AMD

Lastly, I have also added DSWx-NI as an available PGE to smoke test. It should be run with the Intel based ASG.

hhlee445 commented 2 months ago

Awesome, updated version completed the smoke tests for DSWx-S1, DISP-S1, and DSWx-NI PGEs.