mitodl / edx-platform

The Open edX platform, the software that powers edX!
http://open.edx.org/
GNU Affero General Public License v3.0
6 stars 1 forks source link

errors at /api/edx_proctoring/v1/ #286

Closed sentry-io[bot] closed 1 year ago

sentry-io[bot] commented 2 years ago

This error seems to be preventing some users from accessing courseware.

The Residential MITx system had proctoring enabled for some time in 2020, but we have turned it off now.

Sentry Issue: OPENEDX-RESIDENTIAL-47W

CourseOverview.DoesNotExist: 
(15 additional frame(s) were not displayed)
...
  File "edx_proctoring/api.py", line 1633, in update_attempt_status
    credit_state = credit_service.get_credit_state(
  File "openedx/core/djangoapps/credit/services.py", line 103, in get_credit_state
    course_overview = CourseOverview.get_from_id(course_key)
  File "openedx/core/lib/cache_utils.py", line 74, in decorator
    result = wrapped(*args, **kwargs)
  File "openedx/core/djangoapps/content/course_overviews/models.py", line 401, in get_from_id
    return course_overview or cls.load_from_module_store(course_id)
  File "openedx/core/djangoapps/content/course_overviews/models.py", line 343, in load_from_module_store
    raise cls.DoesNotExist()

Similar issue: https://sentry.io/organizations/mit-office-of-digital-learning/issues/2879351415/?project=1757731&query=%2Fapi%2Fedx_proctoring%2F&statsPeriod=14d

pdpinch commented 2 years ago

The steps to reproduce are quite complex, but these are some things we did that seem related:

  1. Turned on timed exams and proctoring
  2. Allowed course authors to add timed exams and proctored exams to their courses, and test them with their user, creating entries in edx_proctoring/proctoredexamstudentattempt. The bug seems to affect users and courses that only took timed exams, without proctoring.
  3. Turned off proctored exams, but left time examed turned on
  4. Deleted courses (using the sysadmin dashboard) that contained the timed exams
  5. When an affected user logs in, to any course, they get this error message above.

It hasn't been tested completely, but it seems like recreating a course with the same id as the error course will stop the error -- maybe even if it doesn't have a timed exam in it.

arslanashraf7 commented 2 years ago

@pdpinch FYI,

While looking through the stack trace and proctoring I saw a few things which might be related.

1) The error seems to complain about CourseOverview not being created. As per the logs when a CourseOverview is looked out and not found it should be created from the modulestore/mongo if present. But, in our case the CourseOverview object wasn't present and when the code went to create it from modulestore/mongo the course was also not present there as well. That's why it ended up in this error. The references for these are here and here.

I will further look in it w.r.t sysadmin maybe there was course reload or deletion in progress at the moment.

pdpinch commented 2 years ago

On Residential MITx, it is our practice to delete courses at the end of the semester, to keep overhead on the database low.

Is there an appropriate way to catch and handle the absence of the CourseOverview, instead of returning a 500 error?

arslanashraf7 commented 2 years ago

Is there an appropriate way to catch and handle the absence of the CourseOverview, instead of returning a 500 error?

I think that depends on what we want as a result of it, probably a reasonable message on the frontend e.g. Learning MFE which could be 422 or others that Frontend might tackle, or something like a better error response e.g. 400 or 404 just in the logging.

But I will take a look at our possibilities here.

pdpinch commented 2 years ago

Do you know what the Learning MFE is expecting from this API call? One of the things that is confusing us is we don’t know why, in this context, it needs to know the result of the proctored exam.

Of course, some of our confusion also comes from the fact that we’ve disabled proctored exams in this deployment. At least, I think we have disabled them. It’s possible that we’ve left some flags on and others off.

pdpinch commented 2 years ago

To be more specific about the context, in this event:

https://sentry.io/organizations/mit-office-of-digital-learning/issues/2879351292/events/c38df96e13af4a66abcb75113cab25f5/?project=1757731

The user is trying to load the course outline for course-v1:MITx+kalebtest1+2022_Spring

and the MFE is making a request to https://lms.mitx.mit.edu/api/edx_proctoring/v1/proctored_exam/attempt/course_id/course-v1:MITx+kalebtest1+2022_Spring

But the error is coming from a missing CourseOverview for a completely different course: CourseLocator('MITx','18.01_ASE', '2020_Fall', None, None)

If course-v1:MITx+kalebtest1+2022_Spring has no proctored exams, why is the MFE requesting https://lms.mitx.mit.edu/api/edx_proctoring/v1/proctored_exam/attempt/course_id/course-v1:MITx+kalebtest1+2022_Spring ?

And when it requests that API, why does the response depend on other course IDs?

I've been thinking about asking about this in the #proctoring channel or the #frontend-working-group channel in open edX slack, but it might be easier to look at the code first.

arslanashraf7 commented 2 years ago

@pdpinch, Thanks for adding the details in your last comment.

I looked around the logs and code and I believe I figured out what’s happening in our case. The answer might be long enough revolving around (edx-platform, edx-proctoring plugin, frontend-app-learning and frontend-lib-special-exams)

I'll try to answer your queries below:

If course-v1:MITx+kalebtest1+2022_Spring has no proctored exams, why is the MFE requesting https://lms.mitx.mit.edu/api/edx_proctoring/v1/proctored_exam/attempt/course_id/course-v1:MITx+kalebtest1+2022_Spring ?

Well, it looks like the MFE hits this API on every course irrespective of the fact that the course itself is proctored/timed or not.

Why does it do that? The MFE has a functionality to show a card for any incomplete exam attempt intimation on top of the course outline page irrespective of the course that a user opens. (It’s reasonable to believe that this behavior is intended). The code references for it can be seen in special-exam-lib and Learning MFE.

A screenshot of how it looks:

Screenshot 2022-01-21 at 5 18 48 PM

Answering second question

And when it requests that API, why does the response depend on other course IDs?

Backend (edx-proctoring plugin): The API from the Learning MFE ends up in this view which then fetches the active exams and attempt data for the user. What needs to be noticed here is that it doesn’t specify the course Id for which to get the active exams for, instead if looks for active exams in all the courses irrespective of the id of the course passed in the API. (https://github.com/openedx/edx-proctoring/blob/master/edx_proctoring/views.py#L230).

As a next step, the API goes through different flow steps checking the active exams, attempts and finally ends up updating the attempt data for the exam if needed(Usually when an exam attempt has expired). If it finds an expired exam it will try to update its attempt status to an appropriate one also keeping in mind if there are any updates needed in grades as well.

Backend/Platform side: As mentioned above, When the proctoring plugin decides to update the grades for an exam attempt(Based on this) it calls credit services from platform which tries to get the course info and fails if it couldn’t find it.

In our case, when the above API(/v1/proctored_exam/attempt/course_id/course-v1:MITx+kalebtest1+2022_Spring) gets hit, it finds some unfinished exams but not from the course that the user opened but instead it finds unfinished and expired exam attempts in course-v1:MITx+18.01_ASE+2020_Fall (I’m certain that this course has been deleted and there is no entry for the course either in CourseOverview or the modulestore/mongo).

As the attempts found are in the past(expired) it tries to update their attempt status and credits, and in doing that it tries to find the course course-v1:MITx+18.01_ASE+2020_Fall. As a result, it ends up in this sentry error an error for not being able to create the CourseOverview object. (Side Note: The CourseOverview object gets created automatically for all the courses that are present in the module store upon creation of the course and whenever the course is accessed but the CourseOverview has somehow been deleted).

I think the above details might explain, questions like why MFE hits this API, Why we end up looking for some other course object when the user has opened another course, why not all the users see this error.

Note: When a course is deleted its ExamAttempt entries are not deleted.

Some reproduction steps: 1- Created a course with timed exams 2- Enroll and open the course, open a timed problem and start it (A timer will start) 3- Now get back to some other course (You will see a ticker above regarding a timed problem in another course) 4- Let the timer expire without submitting your response. 5- Delete the course using edx-sysadmin. 6- Open any course you'll see this error propagating.

Some proposed resolutions: I'm not sure how big of an impact any change in the flow might make, but maybe we can do a few things:

1- When we delete a course we either delete the unfinished exam attempts or update their status before deleting a course. 2- When attempt/course_id API is hit with GET like in this error, we check for ongoing exam attempts related to the course that the user clicked? 3- Make tweaks in the error returned in this case?

I know this is a long response, but let me know if things are not clear.

pdpinch commented 2 years ago

Thank you Arslan. This is really helpful and makes sense.

It seems to me that option 2 is the smallest change. Can you look into that some more?

2- When attempt/course_id API is hit with GET like in this error, we check for ongoing exam attempts related to the course that the user clicked?

I don't see any reason why the API should be querying all exam attempts, rather than just the current course.

arslanashraf7 commented 2 years ago

@pdpinch FYI,

It seems to me that option 2 is the smallest change. Can you look into that some more?

Looks like the function that's called in the API (here) for getting active exams for a user supports course_id too.

The reference to the function can be seen in the proctoring get_active_exams_for_user.

The simplest change could be to just pass the course_id in it and this will look for active exams only for the course that the user opens.

In order to check the usage/impact of this change for the community, I looked at the PR this feat was added in . However, the ticket linked to the PR(https://github.com/openedx/edx-proctoring/pull/837) in which it was added points to an incorrect link and we might not be sure if this was discussed at that point. This is the link to the ticket.

pdpinch commented 2 years ago

Ok. I guess we have to account for cases where we need to fetch all active exams, so even if we filter the request to one course, we'll still eventually get an error.

What about catching the CourseOverview exception and logging it instead of raising a 500 error?

arslanashraf7 commented 2 years ago

@pdpinch regarding below:

What about catching the CourseOverview exception and logging it instead of raising a 500 error?

This looks like a reasonable approach and a pretty feasible one too requiring minimal changes. So, here's how it could be:

1) There are a couple of places like here and here in edx-proctoring where we might go for catching the exception but for this part, we'd have to raise the exception from the platform to all the way here which would require PRs in both edx-proctoring and edx-platform. (I didn't go for this approach)

2) Instead of Point#1, I thought of catching & logging the exception in the edx-platform itself inside the credit service that's called by the edx-proctoring to get course data. Catching it here makes sure that all the other services that try to get the course info can behave as they would like based on the exception thrown. However, in the case of credit service, we'll catch and log it.

Moving ahead, I've created a branch for the changes mentioned in point#2.

The visible impacts of the above change:

Some quick questions If the above looks good:

1) Are we looking to take it upstream of our own fork?

2) Taking it into master or maple (I suppose master, the current branch is for maple and I can update it to be on master).

arslanashraf7 commented 2 years ago

@pdpinch Can this be closed?

arslanashraf7 commented 1 year ago

Closing this without response, Closed via https://github.com/openedx/edx-platform/pull/29834. We can reopen if needed.