To determine in more detail why two objects, i.e., patch messages and commits, were matched or were not matched in an analysis, we would need a command that can give detailed information on that.
For example, ./pasta compare <id1> <id2> could output something like this:
Considered in preevaluation: MATCHED
- message date (2019-01-01) and commit date (2019-01-07) within time bound
- patch author and commit author match
- ... (whatever else is considered in preevaluation)
Considered in patch/commit comparison: NOT MATCHED
- patch/commit message diff score: 0.6
- code diff score: 0.4
- ... (whatever else is considered when comparing)
- overall: 0.5 is below threshold (0.8)
Then, further manual analysis could determine which factors and which aspects can be tuned to improve precision and recall, e.g., which metrics for the comparison could be extended by further refined metrics.
To determine in more detail why two objects, i.e., patch messages and commits, were matched or were not matched in an analysis, we would need a command that can give detailed information on that.
For example,
./pasta compare <id1> <id2>
could output something like this:Then, further manual analysis could determine which factors and which aspects can be tuned to improve precision and recall, e.g., which metrics for the comparison could be extended by further refined metrics.