jplag / JPlag

State-of-the-Art Software Plagiarism & Collusion Detection
https://jplag.github.io/JPlag/
GNU General Public License v3.0
988 stars 289 forks source link

Feat request: multiple programming languages #1546

Open euberdeveloper opened 4 months ago

euberdeveloper commented 4 months ago

As of now it seems that JPlag supports multiple programming languages, but only in a homogeneous way.

This means that I can compare two different submissions both in Java, both in Python but not one in Java and one in Python.

It could seem that it doesn't make sense, but it could actually be a type of obfuscation, translating a program from a language to another one.

Maybe Java and python are not the perfect example, but if we take into account languages such as Java and Kotlin or Scala, that all work with the JVM, this issue becomes more relevant

tsaglam commented 4 months ago

Good point, this relates to cross-language plagiarism detection. While there has been some research in that area, there are (to my knowledge) no usable tools for that. In future, we may want to introduce that by creating a shared token type set for common concepts between languages. Thus, language modules may reuse these token types thus allowing for cross-language support. On a similar note, we may consider polyglot support, meaning parsing multi-language submissions by delegating the different files to different language modules.