Closed dan-tang-ssd closed 2 months ago
I remote logged in to server, manually run command to get exchange rate data for one day. It works properly in both staging env and live env.
Discussed this issue in Engineering team catchup, Dave advised to change the time for adding the failed job back to the queue. E.g. change from 5 minutes to 1 hour. So that it has a higher chance to do the job without human intervention.
Checked that we can specify queue policy when creating a queue in Forge site.
For AEC, the queue is solely used for getting exchange rate data. We can specify it to retry the failed job after 1 hour, retry 8 times and then give up. The first time to run the job is 6:00 AM, after 8 attempts it will be 2:00 PM.
Hopefully the 3rd party server will be resumed within 8 hours. Or we will receive Sentry email alert for this.
I have removed the existing queue and create a new queue with above policy in both staging env and live env. Let's see whether it works when same issue happen again next time.
Screen shot:
Both staging env and live env has error when getting exchange rate for one day. It looks like the server that providing exchange rate data is not accessible at that moment.
Follow-up actions:
Staging env error: https://stats4sd-53.sentry.io/issues/5786019463/?referrer=alert_email&alert_type=email&alert_timestamp=1725256900272&alert_rule_id=15309081¬ification_uuid=3ba4fe91-ca36-49d0-b43d-b599fab0f110&environment=staging
Live env error: https://stats4sd-53.sentry.io/issues/5786037460/?alert_rule_id=15326462&alert_timestamp=1725257487923&alert_type=email&environment=production¬ification_uuid=00725467-ff01-400b-8104-fb71bd09014d&project=4507611585642496&referrer=alert_email