turnitin / moodle-plagiarism_turnitin

Turnitin Plagiarism plugin for Moodle
http://www.turnitin.com
45 stars 69 forks source link

plugin does not handle queued, deleted files correctly #556

Closed thepurpleblob closed 1 month ago

thepurpleblob commented 3 years ago

If a file is deleted before it is sent to turnitin it will be in plagiarism_turnitin_files but no matching entry in the 'files' table. The file will have a 'queued' status. These instances build up.

Once these start to exceed the value of PLAGIARISM_TURNITIN_CRON_SUBMISSIONS_LIMIT it starts to have a detrimental effect on Turnitin processing. A 'missing file' error is generated but no error is flagged in the table, it stays at 'queued'. When there are a lot of these missing queued files it can effectively block turnitin processing.

These should be handled better - I think they should probably result in an actual error condition. There's no point re-queuing a file that cannot be found.

For reference, this is the SQL query we used to find the files...

select *, from_unixtime(mdl_plagiarism_turnitin_files.lastmodified) from mdl_plagiarism_turnitin_files 
left join mdl_files on identifier = pathnamehash
where statuscode = 'queued'
and mdl_files.id is null
order by mdl_plagiarism_turnitin_files.lastmodified desc

The problem code is around line 2570 in lib.php

                $fs = get_file_storage();
                $file = $fs->get_file_by_hash($pathnamehash);

                if (!$file) {
                    plagiarism_turnitin_activitylog('File not found: '.$pathnamehash, 'PP_NO_FILE');
                    $result = true;
                    continue;
                } else {
                    try {
                        $file->get_content();
                    } catch (Exception $e) {
                        plagiarism_turnitin_activitylog('File content not found: '.$pathnamehash, 'PP_NO_FILE');
                        mtrace($e);
                        mtrace('File content not found. pathnamehash: '.$pathnamehash);
                        $result = true;
                        continue;
                    }
                }

The error condition is not updated in the plagiarism_turnitin_files

TomoTsuyuki commented 3 years ago

We have same issue. The schedule task picks up same records and new submissions are not processed because of this issue.

In our case, I can replicate the issue from following process.

  1. A student submit assignment with file.
  2. The student remove the submission.
  3. The schedule task says 'no file to submit' because there is a record in mdl_plagiarism_turnitin_files but not in mdl_files. If the student doesn't submit file again, the record is in the queue forever, because the status remains 'queued'.

I think we need to update the codes for this issue:

  1. Properly handle no file case and mark the record in mdl_plagiarism_turnitin_files as error. We need this as no file case potentially can happen.
  2. When a submission is removed the plugin should remove corresponding record from mdl_plagiarism_turnitin_files table. Same way as it inserts when a submission is created.
  3. Add sorting so older records go first.

The patch #582 fixes for 1. I'll add 2 and 3 later.

thepurpleblob commented 3 years ago

We've actually got a script we run every so often to update the database directly. It's not ideal.

TomoTsuyuki commented 3 years ago

I found 1 is fixed in develop branch as #568

I checked 2 for deleting event, but there is no event in the core, so we just set error if there is no file (fix 1). I added sort by 'lastmodified' for 3 in new pull request #584 for develop branch.

carl-hostrander commented 1 month ago

Thank you for reporting this. Because the latest version of the plagiarism plugin is supported for versions of Moodle 4.1 and higher, I am closing this ticket. However, if you find this issue is occurring with the latest version of the plugin in any of the supported Moodle versions, please create a new ticket and we will address it.