Develop an end-to-end testing strategy

tsaglam commented 2 years ago

We currently have no way of telling whether a change made to JPlag affects the quality of plagiarism detection. This makes refactoring code hard because we never know when a PR has unintended consequences. The reason for this is that JPlag has not enough test cases. We thus need a carefully designed test framework, that uses different data sets and runs JPlag with different configuration options. The JPlag result object is then used to check if JPlag produces the expected results. Of course, when we first add the test cases, the expected results are the current results JPlag produces for these inputs.

The following initial requirements have been identified:

Should cover different kinds/types/classes of plagiarism (could be based on existing research)
Should provide multiple input sets (multiple source code submissions, some are plagiarisms some not) that are somewhat realistic
Should test the functional qualities of JPlag (so we can see if future changes to JPlag make it better/worse), meaning the similarities and so on
Should integrate with existing test cases in a well-rounded test suite
(Hopefully increases source code coverage by using different configurations of the options)

Related research works:

Larsson 2021 references Novak 2019
Karnalim 2016
Devore-McDonald et al. 2020 (less closely related)
Poon et al. 2012 (less closely related)

Steps to do:

[x] Carefully design a strategy (which inputs, which configurations of JPlag, what info of the result object is checked, how are the expected values stored?)
[x] Discuss the strategy and the underlying design choices here
[x] Find or create test data (source code submissions, preferably java)
[x] Implement the test framework based on the strategy and merge it into the master branch
[ ] Document the contribution in the wiki

tsaglam commented 2 years ago

The technical report on JPlag contains different plagiarism classes, this might also be a starting point.

tsaglam commented 2 years ago

@SuyDesignz see this paper for many types of plagiarism patterns: https://ieeexplore.ieee.org/abstract/document/7910274 I also overhauled the issue description to make things clearer.

SuyDesignz commented 2 years ago

@jplag/studdev First of all, I would like to explain the goal of the End-To-End testing strategy. Changes to plagiarism detection in JPlage will be tested. The main reason is to ensure consistent results when adapting the application. Likewise I try to find out, with purposeful tests, in which recognition phase the change finds itself (optional and still in discussion).

The test cases are created based on elaborations in the field of plagiarism and the possibility of avoiding its detection. The papers used for this purpose are: "Detecting Source Code Plagiarism on Introductory Programming Course Assignments Using a Bytecode Approach - Oscar Karnalim" https://ieeexplore.ieee.org/abstract/document/7910274 and "Detecting Disguised Plagiarism - Hatem A. Mahmoud" https://arxiv.org/abs/1711.02149

These elaborations provide basic ideas on how a modification of the plagiarized source code can look like or be adapted. These code adaptations refer to a wide range of changes starting from adding/removing comments to architectural changes in the deliverables.

I have limited myself to the following examples: (points 1 - 4 are the levels of recognition given above and will be deepened in the further examples. A mixture of the points is also provided. Likewise the division of the changes into the phases will be further evaluated and possibly adjusted. The examples are from the articles mentioned before and are in each case on the pages Detecting Source Code Plagiarism [...] -> p3 and Detecting Disguised Plagiarism -> p4)

Inserting comments or empty lines (normalization level)
Changing variable names or function names (normalization level)
Insertion of unnecessary or changed code lines (token generation)
Changing the program flow (token generation) (statments and functions must be independent from each other) 4.1 Variable decleration at the beginning of the program (Detecting Source Code Plagiarism [...]) 4.2 Combining declerations of variables (Detecting Source Code Plagiarism [...]) 4.3 Reuse of the same variable for other functions (Detecting Source Code Plagiarism [...])
Changing control structures (Detecting Disguised Plagiarism) 5.1 for(...) to while(...) 5.2 if(...) to switch-case
Modification of expressions (Detecting Disguised Plagiarism and Detecting Source Code Plagiarism [...]) 6.1 (X < Y) to !(X >= Y) and ++x to x = x + 1
Splitting and merging statements (Detecting Disguised Plagiarism) 7.1 x = getSomeValue(); y = x- z; to y = (getSomeValue() - Z;
Inserting unnecessary casts (Detecting Source Code Plagiarism [...])

Examples for test cases were used by the hs-Karlsruhe http://www.home.hs-karlsruhe.de/~pach0003/informatik_1/aufgaben/java.html and are provided with the above mentioned adaptations. The examples are iteratively integrated into the test framework in order to evaluate the use of the adaptations and to make adjustments if necessary. adjustments if necessary. Further examples can follow and should be based on typical university submissions.

Sorting with and without recursion
Calculator or conversion of units (celsius to fahrenheit, ...)
Calculation of the depth of a binary tree
Calculation of cross sums

tsaglam commented 2 years ago

Sounds good to me so far! I think it is important that we think about where we persist the design rationale behind the dataset. E.g. if we have a plagiarised test submission based on certain levels, we need to document that either as a comment in the submission or maybe a readme. But this is something we can also discuss in a future meeting.

tsaglam commented 1 year ago

Closed by #548 and #551.

jplag / JPlag

Develop an end-to-end testing strategy #193