Closed tsaglam closed 1 year ago
The technical report on JPlag contains different plagiarism classes, this might also be a starting point.
@SuyDesignz see this paper for many types of plagiarism patterns: https://ieeexplore.ieee.org/abstract/document/7910274 I also overhauled the issue description to make things clearer.
@jplag/studdev First of all, I would like to explain the goal of the End-To-End testing strategy. Changes to plagiarism detection in JPlage will be tested. The main reason is to ensure consistent results when adapting the application. Likewise I try to find out, with purposeful tests, in which recognition phase the change finds itself (optional and still in discussion).
The test cases are created based on elaborations in the field of plagiarism and the possibility of avoiding its detection. The papers used for this purpose are: "Detecting Source Code Plagiarism on Introductory Programming Course Assignments Using a Bytecode Approach - Oscar Karnalim" https://ieeexplore.ieee.org/abstract/document/7910274 and "Detecting Disguised Plagiarism - Hatem A. Mahmoud" https://arxiv.org/abs/1711.02149
These elaborations provide basic ideas on how a modification of the plagiarized source code can look like or be adapted. These code adaptations refer to a wide range of changes starting from adding/removing comments to architectural changes in the deliverables.
I have limited myself to the following examples: (points 1 - 4 are the levels of recognition given above and will be deepened in the further examples. A mixture of the points is also provided. Likewise the division of the changes into the phases will be further evaluated and possibly adjusted. The examples are from the articles mentioned before and are in each case on the pages Detecting Source Code Plagiarism [...] -> p3 and Detecting Disguised Plagiarism -> p4)
Examples for test cases were used by the hs-Karlsruhe http://www.home.hs-karlsruhe.de/~pach0003/informatik_1/aufgaben/java.html and are provided with the above mentioned adaptations. The examples are iteratively integrated into the test framework in order to evaluate the use of the adaptations and to make adjustments if necessary. adjustments if necessary. Further examples can follow and should be based on typical university submissions.
Sounds good to me so far! I think it is important that we think about where we persist the design rationale behind the dataset. E.g. if we have a plagiarised test submission based on certain levels, we need to document that either as a comment in the submission or maybe a readme. But this is something we can also discuss in a future meeting.
Closed by #548 and #551.
We currently have no way of telling whether a change made to JPlag affects the quality of plagiarism detection. This makes refactoring code hard because we never know when a PR has unintended consequences. The reason for this is that JPlag has not enough test cases. We thus need a carefully designed test framework, that uses different data sets and runs JPlag with different configuration options. The JPlag result object is then used to check if JPlag produces the expected results. Of course, when we first add the test cases, the expected results are the current results JPlag produces for these inputs.
The following initial requirements have been identified:
Related research works:
Steps to do: