Closed BbrSofiane closed 1 year ago
Gave an update in BTR today. In short, this upgrade has broken edx.org production (but not staging!) 3 separate times. I will work on getting more details about how things broke. So, community involvement in the upgrade would be difficult, unless it breaks your prod env the same way.
edX engineers are variously focused on the Learning MFE, django upgrade, & node upgrade for Maple. There is not internal edX capacity to work on this upgrade prior to Maple.
More update, in case someone in the community wants to see if they run into the same issues in their prod env(s) and wants to try to fix this.
Traceback (most recent call last):
File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/views/views.py", line 615, in get
return super(CourseTabView, self).get(request, course=course, page_context=page_context, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/web_fragments/views.py", line 23, in get
fragment = self.render_to_fragment(request, **kwargs)
File "/edx/app/edxapp/edx-platform/openedx/features/course_experience/views/course_home.py", line 78, in render_to_fragment
return home_fragment_view.render_to_fragment(request, course_id=course_id, **kwargs)
File "/edx/app/edxapp/edx-platform/openedx/features/course_experience/views/course_home.py", line 124, in render_to_fragment
dates_fragment = CourseDatesFragmentView().render_to_fragment(request, course_id=course_id, **kwargs)
File "/edx/app/edxapp/edx-platform/openedx/features/course_experience/views/course_dates.py", line 34, in render_to_fragment
course_date_blocks = get_course_date_blocks(course, request.user, request, num_assignments=1)
File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/courses.py", line 466, in get_course_date_blocks
blocks.extend(get_course_assignment_date_blocks(
File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/courses.py", line 504, in get_course_assignment_date_blocks
for assignment in get_course_assignments(course.id, user, include_access=include_access):
File "/edx/app/edxapp/edx-platform/openedx/core/lib/cache_utils.py", line 77, in decorator
result = wrapped(*args, **kwargs)
File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/courses.py", line 550, in get_course_assignments
assignment_released = not start or start < now
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/dateutil/tz/tz.py", line 222, in utcoffset
if self._isdst(dt):
File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/dateutil/tz/tz.py", line 287, in _isdst
if not self._hasdst:
AttributeError: 'tzlocal' object has no attribute '_hasdst'
Thanks for the update @sarina. If I understand correctly, the error stems from the fact that old tzlocal objects (without _hasdst
attribute) are being deserialized, which causes errors down the line.
My understanding is that the deserialization happens in xmodule.modulestore.split_mongo.mongo_connection.CourseStructureCache.get
. I suggest that this method implements a quick'n dirty fix -- something along the lines of:
data = pickle.loads(pickled_data, encoding='latin-1')
if isinstance(data, dateutils.tzlocal) and not hasattr(data._hasdst):
data._hasdst = False
return data
If I implemented such a change, would edX be willing to test it in production?
I'd have to defer to @jristau1984 or @ormsbee as representatives of T&L on this one
I think based on the track record of deploying changes related to this, I am not comfortable testing this only in production. However, as stated previously, our team does not have the bandwidth to field this currently. I extend my apologies for not being able to prioritize fixing the issue at this time.
What's the resolution strategy here? If we manage to reproduce and fix this issue in staging, will you ok the deployment in prod?
The stacktrace shows that the bug occurs in a place that is gated by an experimental waffle flag:
File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/courses.py", line 466, in get_course_date_blocks
blocks.extend(get_course_assignment_date_blocks(...
if RELATIVE_DATES_FLAG.is_enabled(course.id):
blocks.extend(get_course_assignment_date_blocks(
course, user, request, num_return=num_assignments,
include_access=include_access, include_past_dates=include_past_dates,
))
Is the RELATIVE_DATES_FLAG ("course_experience.relative_dates") enabled in production, but not in staging? Could we enable this flag in staging to reproduce the issue?
@jristau1984 did you see my comment and question above?
Is the RELATIVE_DATES_FLAG ("course_experience.relative_dates") enabled in production, but not in staging? Could we enable this flag in staging to reproduce the issue?
Some major Open edX contributors are affected by this bug, and they would like to see a resolution -- or at least a strategy to resolve this issue.
I'm closing this now considering that:
If someone is still affected, please comment.
In an exercice, you can set in Studio a delay that a learner has to wait before submitting a new answer.
This worked fine in Ironwood but one of our testers found a problem the day before we were going to migrate our production systems from Ironwood to Juniper.
We put a delay of 10 seconds between submissions. It worked fine in Ironwood and previous releases. But in Juniper it would display an error saying we still had to wait 4 hours and 50 minutes... (see the attached imaged, sorry it's in French but you can get the idea).
We have always used EDT for my timezone on my EC2 Ubuntu servers in AWS.
I put a few traces in the code and I discovered that it looked like a subtraction issue between 2 dates in common/lib/xmodule/xmodule/capa_base.py
It magically worked when I reinstalled a new server with the edX fork and forgot to switch the timezone from UTC to EDT. I switched it back to EDT and it failed. I switched it back to UTC and it worked again. I then tried it again successfully on our test server by changing the Ubuntu timezone from EDT to UTC.
Conclusion: The solution or fix right now was to set my system timezone in Ubuntu to UTC instead of EDT. As a result, I also had to change a few of our internal cronjob tasks that were running at 23:30 EDT before and that now need to run at 03:30 UTC.