turnitin / moodle-plagiarism_turnitin

Turnitin Plagiarism plugin for Moodle
http://www.turnitin.com
GNU General Public License v3.0
45 stars 69 forks source link

Performance issues with scheduled tasks #515

Open danmarsden opened 4 years ago

danmarsden commented 4 years ago

The scheuled tasks appear to be extremely in-efficient.

In particular the "update_reports" task which is set to run every 5 min by default.

for one of our sites - it doesn't usually do less than 5000 db reads and 800 writes every time, which means that's it's doing 720,000 reads and 115,200 writes over a 24hr period.

We had a much bigger situation yesterday (guessing a bunch of assignments were submitted) where it was averaging around 25,000 reads, 7000 writes every 5min - so it hit almost 4 Million db reads and 1 million writes over the same 24 hr period.

Looking at this code here: https://github.com/turnitin/moodle-plagiarism_turnitin/blob/74f979a9fa7efd655d26e7e5749dde3a8fab7ce4/lib.php#L1921

I can see several places where db queries are called inside loops when you should really be able to get a lot more information in the initial db query.

thanks!

alex-rowe commented 3 years ago

After looking into performance issues with our scheduled tasks in cron we are also seeing ~7000 DB queries each time plagiarism_turnitin\task\update_reports is run.

Not as bad as customcert which was almost doing 27k queries every minutes, but still needs to be handled better.

danmarsden commented 3 years ago

it's fun when it runs "on the hour" too... because that just happens to be when people like to schedule exams/timed quizzes - right when you don't really want the db server doing heaps of other work...

alex-rowe commented 3 years ago

Here's the last 24 hours of DB queries for our Moodle's sorted by query count.

This is the top 7 queries. 2/3/5/7 all look similar on the graph so are most likely caused by this plugin (but some are used in Moodle elsewhere), or exacerbated by this plugin.

Every 5 minutes, 6681 minimum queries logged in the cron output for this task.

We only return 124 rows for the get_records_select too, so something funny going on with this task.

Untitled

Most of these queries also have a 40/60 split for lock time to query time. Reducing these will remove lots of unnecessary load on the DB but also remove a lot of locking queries.

rlorenzo commented 3 years ago

Seeing the same here. These are our last 3 runs:

42498 reads 9061 writes

42554 reads 9073 writes

39580 reads 8413 writes

I am changing our task runtime to be every 10 minutes instead of every 5 minutes. That function does run a lot of things in loops... See https://github.com/turnitin/moodle-plagiarism_turnitin/blob/master/lib.php#L1954

danmarsden commented 3 years ago

I'd suggest running it every 13 minutes - or a setting that doesn't end up with it running on the hour every hour. You really don't want this task kicking in at the exact moment 500 users (or more!) all enter a timed quiz.

alex-rowe commented 3 years ago

Taking a quick look into this.

The SQL here could be changed to this so we fetch more data in the initial query

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L1961-L1968

SELECT ptf.*, m.name AS 'modulename', cm.instance AS 'moduleinstance'
FROM mdl_plagiarism_turnitin_files ptf
JOIN mdl_course_modules cm ON cm.id = ptf.cm
JOIN mdl_modules m ON m.id = cm.module
WHERE statuscode = 'success'
AND ( similarityscore IS NULL OR duedate_report_refresh = 1 )
AND ( orcapable = 1 OR orcapable IS NULL )

This means we then don't need to call this function get_coursemodule_from_id

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L1974

Removing that function will then remove two of the top queries, one for getting the module name, and one to get the course module record

Doing a call to get the activity module in each part of the query also seems too much, especially if there are a lot of files/submissions that are part of the same assignment. This could be cached in an array or similar and checked as part of the loop.

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L1977

Also should check if $tiisubmission->duedate_report_refresh <> 1 before reassigning it 1 here

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L1980-L1982

Settings per module should also be cached. Seeing 120k odd queries for the plagiarism config lookups too.

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L1985

Is this required here if we already have the config settings per module?

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L2007

Calling get_coursemodule_from_id again here in a loop which is 2 more of the same queries per submission again

https://github.com/turnitin/moodle-plagiarism_turnitin/blob/185971479dd96918ca5b37aa5572500131f48aeb/lib.php#L2101

alex-rowe commented 3 years ago

I'd suggest running it every 13 minutes - or a setting that doesn't end up with it running on the hour every hour. You really don't want this task kicking in at the exact moment 500 users (or more!) all enter a timed quiz.

Running at /10 or /13 will still start it at 0 minutes.

To skip beginning and end of the hour, something like "7/11 " would work. That runs at ":07, :18, :29, :40, and :51", every 11 minutes starting at 7 past.

Edit: Moodle won't allow that cron format, so instead just comma separate the minutes you want it to run ("7,18,29,40,51") and use that instead

rlorenzo commented 3 years ago

@agrowe Unfortunately Moodle's task system flags 7/11 for the minute field as "Data submitted is invalid".

alex-rowe commented 3 years ago

@rlorenzo see my edit above, you can comma separate a list of minutes you want it to run. I used 7,18,29,40,51 as it's odd minutes and I don't want it near a the end or middle of an hour in it's current form.

jmcgettrick commented 3 years ago

Thanks for the insight everyone, I'll get this looked at.

alex-rowe commented 3 years ago

@jmcgettrick, Is this being targeted to be fixed/looked at in a specific upcoming release?

carl-hostrander commented 1 month ago

Thank you for reporting this. Because the latest version of the plagiarism plugin is supported for versions of Moodle 4.1 and higher, I am closing this ticket. However, if you find this issue is occurring with the latest version of the plugin in any of the supported Moodle versions, please create a new ticket and we will address it.