sillywalk / defect-prediction

GNU General Public License v3.0
2 stars 2 forks source link

Commits Messages Bugy #2

Open HuyTu7 opened 5 years ago

HuyTu7 commented 5 years ago

RQ3: Is keywords searching for commits consistent with our mechanical turks? How about human-in-the-loop AI bug reports reading method?

KEYWORDS:

Precision Recall F1
abinit 58.94% 90.13% 71.20%
libmesh 43% 92% 59.12%
lammps 13.38% 89.62% 23.28%
mdanalysis 51.43% 89.62% 31.28%

FASTREAD:

Precision Recall F1
abinit 72.63% 87.83% 79.51%
libmesh 49.89% 90.06% 64.20%
lammps 23.85% 97.53% 34.27%
mdanalysis 41.44% 94.43% 57.60%

For both precision and f1, FASTREAD achieved better performance than just keywords searches, human-in-the-loop AI bug reports reading method are more consistent to the result of our mechanical turks than just keyword searches.

HuyTu7 commented 5 years ago

rq3

timm commented 5 years ago

Why did we see such crap results before with just keywords?

Do you think the added value of human in the loop justifies these gains?

rahlk commented 5 years ago

Why did we see such crap results before with just keywords?

Well, that's because if we use keywords and look at only releases (WITHOUT reading all the commits in between releases, we get poor results)

timm commented 5 years ago

also, ken, given that you are comparing results from 2 treatments, where are your stats tests (using stats.py)