numbas / numbas-lti-provider

An LTI tool provider to run Numbas exams
https://docs.numbas.org.uk/lti/en/latest/
Other
11 stars 12 forks source link

LTI not reporting scores to the VLE; caused by diff_suspend_data bottleneck? #163

Closed hailsaturn closed 3 months ago

hailsaturn commented 3 years ago

I’m having some issues with the LTI tool:

What I've found:

I know enough to figure out some things, but not enough to say I'm completely correct. Outlining my thought process below:

BEGIN "detective work"

I saw this issue: https://github.com/numbas/numbas-lti-provider/issues/133

Decided to dig around a bit and realised that some of the diff_suspend_data tasks are taking hours to complete. The vast majority of the attempts are being handled in a matter of seconds, but there are a few like this that seem to be taking a long time:

PastedGraphic-9 PastedGraphic-8

To further investigate, I added some extra logging details to the diff_suspend_data task in tasks.py (lines 5 and 10 below):

 #import logging at beginning of file
 @periodic_task(crontab(minute='*'))
 def diff_suspend_data():
     attempts = Attempt.objects.filter(diffed=False)
     logging.getLogger('huey').info("Number of attempts remaining: {}".format(len(attempts)))
     MAX_TIME = 10
     start = datetime.now()
     if attempts.exists():
         for a in attempts:
             logging.getLogger('huey').info("Attempt detail: {}".format()))
             diff_scormelements(a)
             t2 = datetime.now()
             if (t2-start).total_seconds()>MAX_TIME:
                 break

This revealed the following (with student information redacted): image In this case, it seems like Workers 1, 2 and 4 are trying to handle the same attempt at once.

END “detective work"

I ended up doing this just to try it out:

But it is still taking a long time to finish, and still not reporting to the VLE.

I am wondering the following:

christianp commented 2 years ago

Thanks for looking into this in so much detail! It's hard to debug the task-runner stuff at a scale where these problems show up.

Could you send me the JSON attempt data for an attempt that takes hours to diff? I've seen it take a few minutes, but hours is very bad indeed.

I've added priorities to the tasks that the server runs, so reporting scores should always take precedence over diffing suspend data. To turn that on, update to the latest version on the master branch (commit 7f177cc8be48d442b85daa7a32845af35e2bbb13), and add the following to numbasltiprovider/settings.py:

HUEY = {
    'huey_class': 'huey.PriorityRedisHuey',
}

If you try that, please let me know if it helps. I can't try it on our server because we're in the middle of our exam resit period and the priorities need a newer version of Redis than our version of Ubuntu provides.

hailsaturn commented 2 years ago

In convenient timing, after I added the randomised selection of attempts to diff, it didn't take too long to finish everything, and got to the end of the queue just recently.

...only to find it was still not reporting scores to the VLE.

But I have figured out why! The lis_result_sourcedid field of each attempt was returning None, so the report_outcome function in _reportoutcome.py was doing nothing.

In the file _/opt/numbas_lti_python/lib/python3.6/site-packages/django_auth_lti/middlewarepatched.py the function __call__ in the class MultiLTILaunchAuthMiddleware defines this variable:

lti_launch = {
           'launch_time': datetime.now().isoformat(),
           'context_id': request.POST.get('context_id', None),
#etc... 
           'user_image': request.POST.get('user_image', None),
}

Missing from this list on my end was one line:

'lis_result_sourcedid': request.POST.get('lis_result_sourcedid', None),

Once I added that line to _middlewarepatched.py, it began reporting to the LMS again. I'm not sure if this was an error in the upgrade process, or an error on my part, or something else - perhaps you would know more?

So it does turn out that the failure to report is separate to the long diff times. Nonetheless, I'll still send you one of the JSON files - I want to make sure I remove any identifying information first, but once I've done that, I'll send it along.

christianp commented 3 months ago

I think that this is fixed, now we're using the prioritised queue.