jplag / JPlag

State-of-the-Art Software Plagiarism & Collusion Detection
https://jplag.github.io/JPlag/
GNU General Public License v3.0
1.41k stars 311 forks source link

Formatting necessary for token string normalization #1027

Open brodmo opened 1 year ago

brodmo commented 1 year ago

For token string normalization to work, the code must be formatted beforehand. I did this by first removing the comments with JavaParser (see here) and then formatting the result with Eclipse with a sepcial config using the following command:

<ecplise> -vm <java> -data <workspace> -application org.eclipse.jdt.core.JavaCodeFormatter -config <config> -nosplash <target directory>

Since either dependency is undesireable in JPlag other solutions for these two tasks need to be found to use token string normalization. My suggestions would be to remove the comments by parsing the submission with javac and turning the ast back into a string with the existing toString method (which I probably should have done) and use spotless for the formatting after. Spotless can use the Eclipse config AFAIK.

tsaglam commented 1 year ago

We could add an option to JPlag to format the code of the submissions, as this could be useful independently of the normalization for manual inspection.