Closed JiyangZhang closed 2 years ago
Hi, thanks for your question. I wonder if there is some data format error in your examples because I wrote a test file to repeat the behavior, but I got two 1.0 match scores.
The test script is:
import dataflow_match
ref = ["if ( db . getCollectionNames () . contains ( collectionName ) ) { db . getCollection ( collectionName ) . drop () ; mongoDBCollections . remove ( collectionName ) ; }"] example1 = "if ( ( collectionName != null ) && ( ! ( db . getCollectionNames () . contains ( collectionName ) ) ) ) { db . getCollection ( collectionName ) . drop () ; mongoDBCollections . remove ( collectionName ) ; }" example2 = "if ( ( ( db ) != null ) && ( ! ( db . getCollectionNames () . contains ( collectionName ) ) ) ) { db . getCollection ( collectionName ) . drop () ; mongoDBCollections . remove ( collectionName ) ; }"
print('======================= example:1 =======================') score1 = dataflow_match.calc_dataflow_match(ref, example1, 'c_sharp') print('======================= example:2 =======================') score2 = dataflow_match.calc_dataflow_match(ref, example2, 'c_sharp')
print('======================= DF match score =======================') print(score1, score2)
And I print the DFGs and normalized DFGs during the calculation. The final output is: ======================= example:1 ======================= Reference DFG: [('db', 2, 'comesFrom', [], []), ('collectionName', 10, 'comesFrom', [], []), ('db', 14, 'comesFrom', ['db'], [2]), ('collectionName', 18, 'comesFrom', ['collectionName'], [10])] Candidate DFG: [('collectionName', 3, 'comesFrom', [], []), ('db', 11, 'comesFrom', [], []), ('collectionName', 19, 'comesFrom', ['collectionName'], [3]), ('db', 25, 'comesFrom', ['db'], [11]), ('collectionName', 29, 'comesFrom', ['collectionName'], [3]), ('collectionName', 40, 'comesFrom', ['collectionName'], [3])] Normalized reference DFG: [('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1'])] Normalized candidate DFG: [('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_0', 'comesFrom', ['var_0']), ('var_0', 'comesFrom', ['var_0'])] ======================= example:2 ======================= Reference DFG: [('db', 2, 'comesFrom', [], []), ('collectionName', 10, 'comesFrom', [], []), ('db', 14, 'comesFrom', ['db'], [2]), ('collectionName', 18, 'comesFrom', ['collectionName'], [10])] Candidate DFG: [('db', 4, 'comesFrom', [], []), ('db', 13, 'comesFrom', ['db'], [4]), ('collectionName', 21, 'comesFrom', [], []), ('db', 27, 'comesFrom', ['db'], [4]), ('collectionName', 31, 'comesFrom', ['collectionName'], [21]), ('collectionName', 42, 'comesFrom', ['collectionName'], [21])] Normalized reference DFG: [('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1'])] Normalized candidate DFG: [('var_0', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_1', 'comesFrom', ['var_1'])] ======================= DF match score ======================= 1.0 1.0
which shows both of the data-flow match scores are 1.0.
Hi, thanks for your response!
First, the example is actually java snippet rather than c_sharp. Second, can I ask which version of tree-sitter you are using? It seems the version will affect the results? I got different scores with 0.2.1
Hi, (1) I switched to java, and the problem you mentioned occurred. The normalized method is nontrivial and the current method relies on the order of variables before normalization. The best way is probably to enumerate all normalized numbers and calculate the maximum match score, but this is unrealistic. Actually, it is a hard problem to calculate the semantic match score, which needs a deep understanding of the code snippet. In the future, we may use the scores generated with deep models (such as BERTScore) as a supplement to BLEU-style scores for the semantic match.
(2) The version of tree-sitter we are using is 0.1.1. Different versions may affect the result because of the language change.
That makes sense.
Thank you!
Please take a look at the following example:
For example 1 here is the data flow graph and score:
For example 2, here is the data flow graph and score
You can see data flow match for ex1 is 0.8 and 1.0 for ex2. Based on the algorithm, they should have the same score 1.0 because they both cover all the data flows in reference. The difference is because of the order of variables after normalization, i.e. var_0 in ex1 is var_1 in ex2. Is it an unexpected behavior?
Thanks!