jcrodriguez-dis / moodle-mod_vpl

Virtual Programming Lab for Moodle (Module)
GNU General Public License v3.0
97 stars 84 forks source link

Java similarity detection reports `100|100|100` for completely different submissions #159

Open FeldrinH opened 8 months ago

FeldrinH commented 8 months ago

We ran the built in plagiarism detection (Similarity tab in Moodle) on some homeworks with submissions in Java and found a lot of submissions with a similarity score of 100|100|100. In previous years we observed that such a high similarity score indicated almost identical files. But this time when comparing the files, we found that they were not similar at all (in many cases at least half of the code lines were different). Is this expected behavior? It certainly feels like a bug and makes the plagiarism detection completely unusable in our case, because the top results are overwhelmingly false positives.

PS: I haven't posted any examples, because all the examples we have are student submissions which we can't share publicly online.

jcrodriguez-dis commented 8 months ago

Dear @FeldrinH,

Thank you for reaching out and detailing the issues you've been encountering with the plagiarism detection feature within VPL.

From version 4.1.0 VPL has been utilizing a new generic tokenizer engine. It is important to note that there was a fix implemented in version 4.2.1 of VPL that addressed known issues with the tokenizer engine that could potentially lead to the problem you've experienced.

As of the current version 4.2.2, we are unable to reproduce the problem you've mentioned. To aid in diagnosing the issue, I would recommend selecting two student code files where this problem is apparent. Before sharing these files, please ensure that all private information is removed or obfuscated. This includes changing variable names, removing comments that may contain personal data, and any other identifiers that could be linked to a specific individual. The sharing of such code is essential for us to investigate this further.

Kind regards, @jcrodriguez-dis

FeldrinH commented 8 months ago

Thanks for the quick response. I suspect our current VPL version may be older than 4.2.1 so this might just be an issue of our university not having updated VPL in the last couple of months. I will get back to you once I've made sure what version of VPL our university Moodle is using.

FeldrinH commented 8 months ago

We are currently running VPL 4.1.0. I suspect the issue we are seeing has indeed already been fixed on the latest version. Will update once our Moodle instance has the latest version of VPL.

FeldrinH commented 8 months ago

It appears that the issue is still present in version 4.2.2. I would prefer to send the code examples to you personally rather than share them here publicly. Would the email listed at https://www.dis.ulpgc.es/profesorado/ficha.asp?id=51 be an appropriate place to send the examples?

jcrodriguez-dis commented 6 months ago

Dear @FeldrinH,

Thank you for your follow-up message and for verifying the version of VPL in use. I understand your concerns about sharing student submissions publicly, and I appreciate your discretion in this matter.

Yes, you can use the email address listed on my profile at https://www.dis.ulpgc.es/profesorado/ficha.asp?id=51 to send me the source files. When preparing the code examples, please ensure that all identifiable information has been sufficiently anonymized to maintain student confidentiality.

I look forward to assisting you further in resolving this issue.

Kind regards, Juan Carlos.