Closed xgdsmileboy closed 5 years ago
Dear Jiajun Jiang,
thank you very much for the feedback. Your understanding of the dataset is correct and there is in fact a discrepancy between the number of misuses reported in the paper and the number of misuses in the dataset. When we conducted the experiment for the TSE, we did not include the following 5 misuses (thus the numbers: 58 - 5 = 53):
itext.5091.dmmc-16a
lucene.1918.tikanga-1a
lucene.1918.tikanga-1b
lucene.1918.tikanga-1c
lucene.1918.tikanga-1d
This is because they are additional instances of the same misuse (same mistake in using the same API in the same method). Since all detectors in the TSE experiments report at most one instance of a particular misuse per method, we included only one instance each in the dataset (itext.5091.dmmc-16
and lucene.1918.tikanga-1
).
In our MSR'19 paper we present MUDetect, a detector that may report multiple instances of the same misuse in a method. We found that we cannot fairly compare all detectors, if we do not distinguish multiple instances of the same misuse within a method. Therefore, we added the additional instances after the fact. In that sense, we didn't change the datasets, but corrected our interpretation of the detector results.
For the MSR'19 experiments, we counted all instances of these misuses that appeared in the top 20 findings as a true positives (Experiment P). For the detectors that report at most one instance of a misuse per methods, we conservatively counted a hit for all instances of the misuse (Experiment R), while for MUDetect, we only counted the exact hits.
I hope this clarifies your confusion. Please feel free to ask further questions! Best, Sven
Hi, Sven, thanks for your detailedly reply, it helps a lot. Sincerely, Jiajun
Hello, this is an excellent project and I am very interested in the project. But, after reading the online ReadMe and your TSE paper, I am confused about the number of
misuses
for different experiments. As described in the ReadMethere should be 53 misuses. But there are exactly only 39 misuses (from line 572 to line 611). As presented in your paper, the "Experiment R" also considers those detected true positives by existing detectors, which I think should be those under
TSE17-ExPrecision-TruePositives
. However, the number of misuses plus these two dataset (i.e.,TSE17-ExRecall
andTSE17-ExPrecision-TruePositives
) should be 58.Thus, it is so confused about the number of misuses. Could you please kindly help whether I understand it correctly? Thanks.