mitodl / odl-video-service

building blocks for a basic video service for ODL
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

handle ThrottlingException from AWS ETS on OVS #1080

Open sentry-io[bot] opened 1 year ago

sentry-io[bot] commented 1 year ago

Is there some way we can avoid or handle these ThrottlingException errors, so they don't send up failing the transcode job?

ClientError: An error occurred (ThrottlingException) when calling the ReadJob operation (reached max retries: ...

Sentry Issue: ODL-VIDEO-SERVICE-23G

ClientError: An error occurred (ThrottlingException) when calling the ReadJob operation (reached max retries: 4): Your application is submitting requests to Amazon Elastic Transcoder faster than the maximum request rate. If you are polling for job status, consider using the notifications feature to receive updates when your jobs change state. For more information, see http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/notifications.html
  File "cloudsync/tasks.py", line 216, in update_video_statuses
    refresh_status(video)
  File "cloudsync/api.py", line 142, in refresh_status
    et_job = get_et_job(encode_job.id)
  File "ui/utils.py", line 186, in get_et_job
    job = et.read_job(Id=job_id)

AWS error when refreshing job status
pdpinch commented 1 year ago

This happened again on 7/30.

Sentry reported 12 errors.

https://mit-office-of-digital-learning.sentry.io/issues/3315285433/?environment=production&project=194353&query=is%3Aunresolved&referrer=issue-stream&stream_index=6

I found 11 videos in the "transcode failed internal error" state in the django admin (which will go down as I reschedule them for transcoding)

https://video.odl.mit.edu/admin/ui/video/?q=&status__exact=Transcode+failed+internal+error

For some reason only 4 showed up in ZenDesk (maybe ZD thought they were spam):

https://github.com/mitodl/hq/issues/2044 https://github.com/mitodl/hq/issues/2045 https://github.com/mitodl/hq/issues/2046 https://github.com/mitodl/hq/issues/2047