Bug in sintax_normalize of C, Python and Java

As part of a standalone plagiarism scanner based on the VPL ruleset, I noticed that a comparison between two files yielded different results in the standalone version compared to the Moodle version.

After thorough investigation, I concluded my code was fine, so I started debugging VPL itself. Upon doing so, I noticed that the sintax_normalize of the C, Python and Java versions all have structures similar to this:

$token->value = '=';
$ret [] = $token;
$token->value = '+';
$ret [] = $token;
break;

While initially this looks okay, turns out $token is added as a reference to $ret, meaning the second $token->value assignment also manipulates the one that had been added the line before.

In this example, the resulting $ret array actually contains two +-tokens, rather than the expected =- and +-token.

After replacing the above structures with this:

$ret [] = new vpl_token( vpl_token_type::OPERATOR, '=', $token->line);
$ret [] = new vpl_token( vpl_token_type::OPERATOR, '+', $token->line);
break;

I can confirm that VPL now yields the exact same results as my standalone version.

I'll make a pull request to fix this for all offending pieces of code.

jcrodriguez-dis / moodle-mod_vpl

Bug in sintax_normalize of C, Python and Java #106