mitodl / edx-platform

The Open edX platform, the software that powers edX!
http://open.edx.org/
GNU Affero General Public License v3.0
6 stars 1 forks source link

Lilac: Error on file upload in Studio (intermittent) #267

Closed briangrossman closed 3 years ago

briangrossman commented 3 years ago

Sometimes when uploading a file (in Content => Files & Uploads), the user will receive an error message, but the file appears to upload successfully.

This error was reported on the Summer-2021 github: https://github.mit.edu/mitx/Summer-2021/issues/11

Details:

Got an error on upload:

e4307480-ba21-11eb-96c5-bdae7135ef38

But the file was uploaded:

06c28d80-ba22-11eb-8446-34a556e1fde9

Sar was able to reproduce it using this PDF: web-cache-poisoning.pdf and saw the following error message:

`[2021-06-10 18:33:40 +0000] [116845] [INFO] POST /assets/course-v1:MITx+mkd.upgrade_1+2021_Summer/
2021-06-10 18:33:40,862 ERROR 116845 [root] [user None] [ip None] signals.py:22 - Uncaught exception from None
Traceback (most recent call last):
File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 153, in find
with self.fs.get(content_id) as fp:
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/gridfs/init.py", line 153, in get
gout._ensure_file()
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/gridfs/grid_file.py", line 489, in _ensure_file
raise NoFile("no file in gridfs collection %r with _id %r" %
gridfs.errors.NoFile: no file in gridfs collection Collection(Database(MongoClient(host=['mongodb-master.service.consul:27017'], document_class=dict, tz_aware=True, connect=True, connecttimeoutms=2000, readpreference='nearest', replicaset='rs0', sockettimeoutms=3000, ssl=False), 'contentstore_next_residential_live'), 'fs.files') with _id 'asset-v1:MITx+mkd.upgrade_1+2021_Summer+type@asset+block@web-cache-poisoning.pdf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
return view_func(request, *args, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/utils/decorators.py", line 142, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/edx/app/edxapp/edx-platform/cms/djangoapps/contentstore/views/assets.py", line 83, in assets_handler
return _update_asset(request, course_key, asset_key)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/views/decorators/http.py", line 40, in inner
return func(request, *args, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
return view_func(request, *args, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/utils/decorators.py", line 142, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/edx/app/edxapp/edx-platform/cms/djangoapps/contentstore/views/assets.py", line 538, in _update_asset
return _upload_asset(request, course_key)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/views/decorators/http.py", line 40, in inner
return func(request, *args, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/utils/decorators.py", line 142, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
return view_func(request, *args, **kwargs)
File "/edx/app/edxapp/edx-platform/cms/djangoapps/contentstore/views/assets.py", line 420, in _upload_asset
readback = contentstore().find(content.location)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/mongodb_proxy.py", line 55, in wrapper
return func(*args, **kwargs)
File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 173, in find
raise NotFoundError(content_id) # lint-amnesty, pylint: disable=raise-missing-from
xmodule.exceptions.NotFoundError: asset-v1:MITx+mkd.upgrade_1+2021_Summer+type@asset+block@web-cache-poisoning.pdf`
umarmughal824 commented 3 years ago

@pdpinch This error is not reproducible on my local machine. @shaidar Did you follow any specific steps to produce it?

Screenshot 2021-06-14 at 18 45 04
shaidar commented 3 years ago

@umarmughal824 On QA next I managed to run into it when uploading a pdf for a second time.

umarmughal824 commented 3 years ago

@umarmughal824 On QA next I managed to run into it when uploading a pdf for a second time.

@shaidar I have run it plenty of times

umarmughal824 commented 3 years ago

I tried today as well but it's not reproducible

HamzaIbnFarooq commented 3 years ago

I was able to regenerate the issue just for once at https://studio-mitx-qa-next.mitx.mit.edu/ but it never showed up again even after quite deliberate efforts. But just to understand the issue better I am writing my findings below:

Whenever a file is uploaded on the studio, a POST request is done which uploads the file and right after that fetches the file to verify if it is successfully uploaded. And this immediate file fetching sometimes messes up, generating the current issue.

The code that deals with the file uploading is not changed for a couple of years, but the mongo used for this purpose may have some impact.

I am not sure but a most probable issue with mongo can be the writing speed of files from mongo's master-node to mongo's servant-nodes. As the files are written to master-node and read from servant-nodes it may sometimes cause a lag/glitch.

@shaidar do you think the conspiracy theory above makes sense?

HamzaIbnFarooq commented 3 years ago

A simple solution can be just to add a retry at https://github.com/edx/edx-platform/blob/master/cms/djangoapps/contentstore/views/assets.py#L420 with a few seconds sleep.

pdpinch commented 3 years ago

Sounds like a race condition? I like Hamza's suggestion of doing a retry. However, I'd like to wait and see if this problem continues to occur in production first.

pdpinch commented 3 years ago

Closing this, as we haven't had any further reports.