unity-sds / unity-cs

Unity Common Services
Apache License 2.0
0 stars 2 forks source link

Implement lambda in Venue account to periodically gather health status #367

Closed galenatjpl closed 4 months ago

galenatjpl commented 7 months ago

Implement a (most likely) lambda function that periodically fires off and gathers the health status of each of the below services. The status will be gathered into a JSON file, which will be uploaded to a S3 bucket:

Screenshot 2024-04-09 at 8 41 00 PM

Where are the Health Check Endpoints defined?

The set of "healthCheck" endpoints will be defined by what's in SSM. Health check endpoints will be defined in SSM parameters, starting with /unity/healthCheck/... For example: /unity/healthCheck/<MARKETPLACE_ITEM>/<COMPONENT_NAME> For shared services, shared-services is effectively the MARKETPLACE_ITEM example: /unity/healthCheck/shared-services/data-catalog For venue services, an example would be: /unity/healthCheck/sps/airflowUi

Who Creates the SSM entries?

The Service Areas (not U-CS) are responsible for creating the SSM entries.

How does the querying occur?

A lambda function periodically fires off (nominally every 5 minutes -- probably leveraging AWS EventBridge) and:

  1. queries SSM for all params starting with /unity/healthCheck/
    • /unity/healthCheck/${PROJECT}/${VENUE}/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
    • /unity/healthCheck/shared-services/<MARKETPLACE_ITEM>/<COMPONENT_NAME>
  2. gathers the health status of each of the URLs found in the /unity/healthCheck/... SSM values. For now, HTTP 200 represents HEALTHY, and anything else represents UNHEALTHY. Some of the URLs represented in the SSM values are endpoints in the shared services AWS account, and others are in the venue account.
  3. Generates the JSON status file, with the statuses (healthy or unhealthy). EXAMPLE JSON file:
    {
    "services": [
    {
      "service": "airflow",
      "landingPage":"https://unity.com/project/venue/processing/ui",
      "healthChecks": [
        {
          "status": "HEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    },
    {
      "service": "jupyter",
      "landingPage":"https://unity.com/project/venue/ads/jupyter",
      "healthChecks": [
        {
          "status": "HEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    },
    {
      "service": "otherService",
      "landingPage":"https://unity.com/project/venue/other_service",
      "healthChecks": [
        {
          "status": "UNHEALTHY",
          "date": "2024-04-09T18:01:08Z"
        }
      ]
    }
    ]
    }
  4. Upload JSON file to S3 bucket. Use the bucket defined in https://github.com/unity-sds/unity-cs/issues/370

What if the healthCheck endpoint is secured? How will I work around that?

@mike-gangl mentions that there is a methodology for getting the username/password from SSM, then getting a token. See https://github.com/unity-sds/unity-data-services/blob/develop/cumulus_lambda_functions/lib/cognito_login/cognito_token_retriever.py for an example of how U-DS gets a token.that's getting the cognito login and then something like https://github.com/unity-sds/unity-data-services/blob/develop/cumulus_lambda_functions/stage_in_out/dapa_client.py uses that cognito token to make calls. See also: https://github.com/unity-sds/sounder-sips-tutorial/blob/develop/jupyter-notebooks/tutorials/2_working_with_data.ipynb

See diagrams and other notes in https://github.com/unity-sds/unity-project-management/issues/101

Dependencies

Other epics or outside tickets required for this to work

rtapella commented 4 months ago

updated json format: see https://github.com/unity-sds/unity-project-management/issues/101#issuecomment-2045802548

galenatjpl commented 4 months ago

@mike-gangl This ticket is implemented, and we are closing this, to take credit for the work in 24.2. We can run everything manually, and it's fine. We will open up another ticket to do the final testing in 24.3. @hargitayjpl and @jdrodjpl will be getting together to run the test and confirm things.