jplag / JPlag

State-of-the-Art Software Plagiarism & Collusion Detection
https://jplag.github.io/JPlag/
GNU General Public License v3.0
1.18k stars 299 forks source link

Issues with cpp2 language module #1427

Open tsaglam opened 8 months ago

tsaglam commented 8 months ago

From #1408:

I got errors while using cpp2 on C file a quick test/analysis suggests that cpp2 assumes the code is C++ and that new is a keyword while it's not in C

more tests: cpp2 doesn't allow ÔŒλ and alike

nheir commented 8 months ago
  1. the C++ grammars-v4 targets C++14, the grammars-v4 repository also contains a grammar for C targeting C11, avoiding the new keyword issue

  2. None support extended character (but \uXXXX are well supported). Reading the standard, it looks like those characters are implementation defined characters, and I suppose it's either a burden or not possible to have then in the a generic way in the grammar. If jplag assumes a specific encoding for files, it may uses a patched version of the gramar to address the issue. In an other hand, non ascii characters in identifier are far from common

TwoOfTwelve commented 6 months ago

The problem with Unicode identifiers is an antlr error. An issue has been created: https://github.com/antlr/grammars-v4/issues/3952