podaac / SOTO

State of the Ocean (SOTO)
3 stars 0 forks source link

Ops error with OPERA_L3_DSWX-HLS_V1 browse images #36

Open frankinspace opened 1 month ago

frankinspace commented 1 month ago

Starting with release of bignbit 0.1.1 on 2024-07-31 GIBS delivery of browse images has been interrupted resulting in no browse images being available through worldview for this collection.

frankinspace commented 1 month ago

@ymchenjpl discovered that the old SNS response topic was removed during deployment of bignbit 0.1.1, causing no responses to be received from GIBS.

Took corrective action in ops:

  1. Manually recreated svc-pobit-podaac-ops-cumulus-gibs-response-topic in OPS AWS Console. Confirmed permissions are set that allows GIBS to publish to that topic
  2. Subscribe svc-bignbit-podaac-ops-cumulus-gibs-response-queue to the manually created svc-pobit-podaac-ops-cumulus-gibs-response-topic

This should restore bignbit operations now.

Then, next steps would be: Update GIBS ICD to change the response topic from svc-pobit-podaac-ops-cumulus-gibs-response-topic to the new svc-bignbit-podaac-ops-cumulus-gibs-response-topic Once GIBS updates to the new topic and we confirm we are still receiving the responses, remove the manually created svc-pobit-podaac-ops-cumulus-gibs-response-topic

frankinspace commented 1 month ago

GIBS ops has reported

Last successful pull from our side was 7/31/24 at 06:52:26 (local time EDT). And nothing since

frankinspace commented 1 month ago

Still have not received any responses from GIBS after re-establishing the correct response topic which indicates there is another problem going on.

May need to consider rolling back bignbit update.

frankinspace commented 1 month ago

Plan is to roll back to big v0.3.3 and pobit v0.4.1 in UAT and retry sending an OPERA granule to GIBS UAT. If that works we can also rollback ops, if it doesn't we will need further debugging.

voxparcxls commented 1 month ago

Rollback(PR) big v0.3.3 & pobit v0.4.1 https://github.jpl.nasa.gov/podaac/cumulus-deploy-tf/pull/360

frankinspace commented 1 month ago

PO.DAAC has re-deployed UAT venue with the v0.3.3 BIG and v0.4.1 POBIT components as a dry-run for fixing the OPS venue.

3 Opera OPERA_L3_DSWX-HLS_V1 granules to sent to GIBS UAT.


GIBS confirmed they processed the following in UAT


And PO.DAAC confirmed responses were received for the 3 granules in UAT. Will gain consensus and apply the roll-back to ops.

frankinspace commented 1 month ago

Roll back was applied in OPS. Confirmed success of OPERA_L3_DSWx-HLS_T01FBE_20240727T215911Z_20240803T144035Z_S2A_30_v1.0 in OPS with GIBS. Count of responses returned from GIBS increased from 0 it was showing previously. image

viviant100 commented 1 month ago

Deployed to ops on 8/6/24.

torimcd commented 1 week ago

Issue was in GITC configuration, fix in place in UAT. Testing with bignbit 0.1.1 in 24.3 IP sprint via https://github.com/podaac/bignbit/issues/4