openedx / wg-build-test-release

Open edX Build / Test / Release Working Group
25 stars 15 forks source link

Cannot set submission delay on non UTC server #9

Closed BbrSofiane closed 1 year ago

BbrSofiane commented 3 years ago

In an exercice, you can set in Studio a delay that a learner has to wait before submitting a new answer.

This worked fine in Ironwood but one of our testers found a problem the day before we were going to migrate our production systems from Ironwood to Juniper.

We put a delay of 10 seconds between submissions. It worked fine in Ironwood and previous releases. But in Juniper it would display an error saying we still had to wait 4 hours and 50 minutes... (see the attached imaged, sorry it's in French but you can get the idea).

We have always used EDT for my timezone on my EC2 Ubuntu servers in AWS.

I put a few traces in the code and I discovered that it looked like a subtraction issue between 2 dates in common/lib/xmodule/xmodule/capa_base.py

It magically worked when I reinstalled a new server with the edX fork and forgot to switch the timezone from UTC to EDT. I switched it back to EDT and it failed. I switched it back to UTC and it worked again. I then tried it again successfully on our test server by changing the Ubuntu timezone from EDT to UTC.

Conclusion: The solution or fix right now was to set my system timezone in Ubuntu to UTC instead of EDT. As a result, I also had to change a few of our internal cronjob tasks that were running at 23:30 EDT before and that now need to run at 03:30 UTC.

BbrSofiane commented 3 years ago

https://github.com/edx/edx-platform/pull/25818

sarina commented 3 years ago

Gave an update in BTR today. In short, this upgrade has broken edx.org production (but not staging!) 3 separate times. I will work on getting more details about how things broke. So, community involvement in the upgrade would be difficult, unless it breaks your prod env the same way.

edX engineers are variously focused on the Learning MFE, django upgrade, & node upgrade for Maple. There is not internal edX capacity to work on this upgrade prior to Maple.

sarina commented 3 years ago

More update, in case someone in the community wants to see if they run into the same issues in their prod env(s) and wants to try to fix this.

regisb commented 3 years ago

Thanks for the update @sarina. If I understand correctly, the error stems from the fact that old tzlocal objects (without _hasdst attribute) are being deserialized, which causes errors down the line.

My understanding is that the deserialization happens in xmodule.modulestore.split_mongo.mongo_connection.CourseStructureCache.get. I suggest that this method implements a quick'n dirty fix -- something along the lines of:

data = pickle.loads(pickled_data, encoding='latin-1')
if isinstance(data, dateutils.tzlocal) and not hasattr(data._hasdst):
    data._hasdst = False
return data

If I implemented such a change, would edX be willing to test it in production?

sarina commented 3 years ago

I'd have to defer to @jristau1984 or @ormsbee as representatives of T&L on this one

jristau1984 commented 3 years ago

I think based on the track record of deploying changes related to this, I am not comfortable testing this only in production. However, as stated previously, our team does not have the bandwidth to field this currently. I extend my apologies for not being able to prioritize fixing the issue at this time.

regisb commented 3 years ago

What's the resolution strategy here? If we manage to reproduce and fix this issue in staging, will you ok the deployment in prod?

The stacktrace shows that the bug occurs in a place that is gated by an experimental waffle flag:

File "/edx/app/edxapp/edx-platform/lms/djangoapps/courseware/courses.py", line 466, in get_course_date_blocks
    blocks.extend(get_course_assignment_date_blocks(...
if RELATIVE_DATES_FLAG.is_enabled(course.id):
    blocks.extend(get_course_assignment_date_blocks(
        course, user, request, num_return=num_assignments,
        include_access=include_access, include_past_dates=include_past_dates,
    ))

Is the RELATIVE_DATES_FLAG ("course_experience.relative_dates") enabled in production, but not in staging? Could we enable this flag in staging to reproduce the issue?

regisb commented 3 years ago

@jristau1984 did you see my comment and question above?

Is the RELATIVE_DATES_FLAG ("course_experience.relative_dates") enabled in production, but not in staging? Could we enable this flag in staging to reproduce the issue?

Some major Open edX contributors are affected by this bug, and they would like to see a resolution -- or at least a strategy to resolve this issue.

regisb commented 1 year ago

I'm closing this now considering that:

If someone is still affected, please comment.