jplag / JPlag

State-of-the-Art Source Code Plagiarism & Collusion Detection
https://jplag.github.io/JPlag/
GNU General Public License v3.0
1.43k stars 315 forks source link

CPP2 language module has issues with block comments #1234

Closed NiklasHeneka closed 1 year ago

NiklasHeneka commented 1 year ago

It seems the grammar from the cpp2 language module has problems parsing block comments containing code.

When running JPlag with the cpp2 language module, files containing block comments with code were skipped even though they were valid CPP code that can be executed since they are just comments.

Here is one of the skipped CPP files:

#include <iostream>

/*struct Dijete
{
    std::string ime;
    Dijete *sljedeci;
}

void Razvrstavanje(std::vector<std::string> &s, int k)
{

    Dijete *pocetak(nullptr), *prethodni;
    for(int i(0); i<s.size(); i++)
    {
        Dijete *novi(new Dijete);
        novi->ime=s[i];
        novi->sljedeci=nullptr;
        if(!pocetak) pocetak=novi;
        else prethodni->sljedeci=novi;
        prethodni=novi;

    }

}*/

int main ()
{
    return 0;
}
SirYwell commented 1 year ago

Building the current develop branch and running the cli with the cpp2 language module using the given example yields the following:

2023-08-11-09:58:44_588 [INFO] LanguageLoader - Available languages: '[C/C++ Scanner [basic markup], C/C++ Parser, C# 6 Parser, EMF metamodel, EMF models (dynamically created token set), Go Parser, Javac based AST plugin, Kotlin Parser, Python3 Parser, R Parser, Rust Language Module, Scala parser, SchemeR4RS Parser [basic markup], SCXML (Statechart XML), Swift Parser, Text Parser (naive)]'
2023-08-11-09:58:44_849 [INFO] CLI - Summary of all Errors:
2023-08-11-09:58:44_832 [ERROR] CLI - Not enough valid submissions! (found 0 valid submissions)
2023-08-11-09:58:44_850 [INFO] SubmissionSet - Summary of all Errors:
2023-08-11-09:58:44_817 [ERROR] SubmissionSet - Submission example.cpp contains 4 token(s), which is less than the minimum match length (12)!
2023-08-11-09:58:44_817 [ERROR] SubmissionSet - ERROR -> Submission example.cpp removed

So it seems like the file is only ignored because it doesn't contain enough code to produce >=12 tokens. Also running with -t 4 makes it a valid submission for me.

NiklasHeneka commented 1 year ago

Yes you are right. When I was running a data set I received two parsing errors and JPlag told me it skipped two files. I assumed these were the same files. But JPlag just skipped them because it doesn't contain enough code to produce >=12 tokens. The files with the parsing errors were completely different files.