We have seen git-extraction fail for some reasons. At least for some, we should try automated recovery, and if that fails (or doesn't make the extraction success during the next turn), alert a human.
One example is the following error:
OSError("failed to lock file '/mnt/efs/addons.mozilla.org/git-storage/97/0697/2720697/addon/.git/refs/heads/listed.lock' for writing: ")
could potentially be recovered by:
AddonGitRepository(2720697).delete()
Another example:
"Uncaught exception:
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 468, in trace_task
I, R, state, retval = on_error(task_request, exc, uuid)
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 379, in on_error
R = I.handle_error_state(
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 178, in handle_error_state
return {
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 225, in handle_failure
task.backend.mark_as_failure(
File "/usr/local/lib/python3.9/site-packages/celery/backends/base.py", line 220, in mark_as_failure
self._call_task_errbacks(request, exc, traceback)
File "/usr/local/lib/python3.9/site-packages/celery/backends/base.py", line 243, in _call_task_errbacks
errback(request, exc, traceback)
File "/usr/local/lib/python3.9/site-packages/celery/canvas.py", line 168, in __call__
return self.type(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 200, in _inner
reraise(*exc_info)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 54, in reraise
raise value
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 195, in _inner
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 735, in __protected_call__
return orig(self, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/celery/app/task.py", line 392, in __call__
return self.run(*args, **kwargs)
File "/data/olympia/src/olympia/amo/decorators.py", line 121, in wrapper
return f(*args, **kw)
File "/data/olympia/src/olympia/git/tasks.py", line 90, in on_extraction_error
remove_git_extraction_entry(addon_pk)
File "/usr/local/lib/python3.9/site-packages/celery/local.py", line 188, in __call__
return self._get_current_object()(*a, **kw)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 200, in _inner
reraise(*exc_info)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 54, in reraise
raise value
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 195, in _inner
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 735, in __protected_call__
return orig(self, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/celery/app/task.py", line 392, in __call__
return self.run(*args, **kwargs)
File "/data/olympia/src/olympia/amo/decorators.py", line 121, in wrapper
return f(*args, **kw)
File "/data/olympia/src/olympia/git/tasks.py", line 24, in remove_git_extraction_entry
GitExtractionEntry.objects.filter(addon_id=addon_pk, in_progress=True).delete()
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 746, in delete
deleted, _rows_count = collector.delete()
File "/usr/local/lib/python3.9/site-packages/django/db/models/deletion.py", line 400, in delete
with transaction.atomic(using=self.using, savepoint=False):
File "/usr/local/lib/python3.9/site-packages/django/db/transaction.py", line 207, in __enter__
connection.set_autocommit(False, force_begin_transaction_with_broken_autocommit=True)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py", line 415, in set_autocommit
self._set_autocommit(autocommit)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/mysql/base.py", line 272, in _set_autocommit
self.connection.autocommit(autocommit)
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/mysql/base.py", line 272, in _set_autocommit
self.connection.autocommit(autocommit)
File "/usr/local/lib/python3.9/site-packages/MySQLdb/connections.py", line 239, in autocommit
_mysql.connection.autocommit(self, on)
<class 'django.db.utils.InterfaceError'>
InterfaceError(0, '')
"
Our understanding is that extraction might have succeeded, but in any case, the task took so long that the server reached the threshold for open database connections and closed it.
I am not sure what we could try doing here without human intervention. I am open to any suggestions.
Others might not be as easily or at all automatedly recoverable, in which we case we should alert a human.
We have seen git-extraction fail for some reasons. At least for some, we should try automated recovery, and if that fails (or doesn't make the extraction success during the next turn), alert a human.
One example is the following error:
could potentially be recovered by:
Another example:
(https://sentry.io/organizations/mozilla/issues/3190120152/?project=6310819&query=is%3Aunresolved) likely caused by https://sentry.io/organizations/mozilla/issues/3190120144/?project=6310819
Our understanding is that extraction might have succeeded, but in any case, the task took so long that the server reached the threshold for open database connections and closed it. I am not sure what we could try doing here without human intervention. I am open to any suggestions.
Others might not be as easily or at all automatedly recoverable, in which we case we should alert a human.
┆Issue is synchronized with this Jira Task