Closed thanhlecongg closed 1 year ago
Hi, @thanhlecongg ! Do you mean matching ids in our data to the original defects4j bug? If so, please refer to the meta information in this folder https://drive.google.com/drive/folders/1LulqZWVftmFevh-DeCtjA3HLKarXje7q
Binfo_d4j.json contains concrete bug information, Minfo_d4j.json contains meta information of the buggy function.
Thank for your quick reply. This is what I want. Btw, I have one more question: The paper mentioned NPR4J use 260 bugs for evaluation but after preprocessing we obtain ~400 bugs (394 bugs for SeqR and 400 bugs for CocoNUT). Could you provide ids of 260 bugs. Thanks.
please download EvaluationBenchmarks.zip from this link https://drive.google.com/file/d/1dSVsGaU9z1Q3a1AU-KKPmqp6HO_xvUOs/view?usp=share_link EvaluationBenchmarks/d4j.ids contains the ids we used. To ease the experiment, we only evaluate NPR models on one-line replacement bugs and bugs composed of one-line replacement bugs. So we filter the 835 bugs of Defects4J V1+V2 and get 260 bugs (nearly 220 are one-line bugs and the other are composed of multiple one-line changes)
Yes, I have downloaded the file. But I obtained 400 ids instead of 260 ids. Please see image for a screen shot.
Each id only represents a "one-line replacement" code change. Some bugs are composed of multiple "one-line replacement" (i.e., consists of several ids). At each time, NPR systems only focus on generating patch codes for one id. When evaluating, if the bug is composed with multiple ids, we sequentially replace id-patches to each id-position. For example, bug A has 3 hunks (represents by 3 ids), and we generate 2 candidates for each hunk (id). Then patch-apply sequence will be: (1,1,1), (1,1,2), (1,2,1), (1,2,2), (2,1,1) ......
Thanks. I got it. But, as NPR4J produce top-100 predictions, is the number of patches need to be tested for a bugs with 3 hunks very large (100^{3})?
Yes, so wo set maximum total evaluation times for each bug. For example, to limit the total evaluation times to 100 on a 3-hunk bug, we first calculate a max X that satisfy X^{3} <= 100, then the sequence will be (from 1 to X, from 1 to X, from 1 to X).
Everything make sense for me now. Many thanks for your kind explainations.
Hi,
I followed your instructions to run the experiments. We found that from the hunk ids (in file d4j.ids.new), only 191 out of 260 Defects4J bugs contain all modified hunks in Binfo_d4j.json while others 69 bugs only contain at least 1 hunks but do not contains all hunks. I wonder why some hunks from these 69 bugs were not considered in your evaluation. Is this due to "Step 3: Purifying and enriching evaluation resources" in your dataset constructions? And, why you do not only consider the bugs containing all hunks as APR cannot completely fix the bugs that do not contains all hunks?
Many thanks. Have a nice day.
This is id of 69 bugs I mentioned: ['Closure-165', 'JacksonDatabind-103', 'Closure-155', 'Closure-134', 'Closure-34', 'JxPath-13', 'JacksonDatabind-38', 'Cli-39', 'Jsoup-87', 'JacksonDatabind-10', 'Closure-90', 'JacksonCore-12', 'Math-18', 'Closure-147', 'Closure-169', 'JacksonDatabind-15', 'Mockito-17', 'Closure-157', 'JacksonDatabind-52', 'Closure-27', 'Compress-47', 'JxPath-16', 'Closure-108', 'Closure-148', 'Mockito-14', 'Closure-72', 'Mockito-11', 'Closure-144', 'Jsoup-92', 'Closure-37', 'Chart-18', 'Mockito-23', 'JxPath-20', 'Time-26', 'Gson-4', 'JacksonDatabind-55', 'Math-83', 'Closure-149', 'Closure-163', 'Lang-32', 'Closure-167', 'Math-100', 'Cli-31', 'Closure-89', 'JacksonDatabind-31', 'JacksonDatabind-95', 'Closure-30', 'Math-81', 'JacksonCore-17', 'Closure-100', 'Chart-22', 'JacksonCore-24', 'Math-47', 'Cli-1', 'Cli-13', 'JacksonDatabind-53', 'JacksonDatabind-65', 'JacksonDatabind-73', 'Closure-75', 'JacksonDatabind-14', 'Lang-36', 'Lang-15', 'Cli-33', 'Closure-9', 'Mockito-4', 'Cli-18', 'Math-65', 'JacksonDatabind-108', 'Math-62']
This is my code to count these bugs.
import json
import codecs
def load_info(line_bug_info, method_bug_info):
bug_info = {}
for idx in range(len(line_bug_info)):
assert line_bug_info[idx]["parent_id"] == method_bug_info[idx]["_id"]
hash_id = line_bug_info[idx]["_id"]["$oid"]
tmp = line_bug_info[idx]["parent_id"].split("\\")[-1].split("/")[1].split(".")[0].split("_")
bug_id = "-".join(tmp[0:2])
bug_class = tmp[2]
start_line = method_bug_info[idx]["BLine_buggy"]
end_line = method_bug_info[idx]["ELine_buggy"]
bug_method = method_bug_info[idx]["methodname"]
if bug_id not in bug_info:
bug_info[bug_id] = {}
bug_info[bug_id][hash_id] = {"bug_class": bug_class,
"bug_method": bug_method,
"start_line": start_line,
"end_line": end_line }
return bug_info
def main():
line_bug_info = json.load(codecs.open("meta_info/Binfo_d4j.json",'r',encoding='utf8'))
method_bug_info = json.load(codecs.open("meta_info/Minfo_d4j.json",'r',encoding='utf8'))
bug_info = load_info(line_bug_info, method_bug_info)
d4j_ids = []
with open("d4j.ids.new", "r") as f:
for line in f:
d4j_ids.append(line.strip().split("_")[1])
cnt = 0
al = []
for bug_id, info in bug_info.items():
is_valid = True
at_least = False
for hunk_id in info.keys():
if hunk_id not in d4j_ids:
is_valid = False
else:
at_least = True
if not is_valid and at_least:
al.append(bug_id)
cnt += 1
print(al)
if __name__ == "__main__":
main()
Hi, @thanhlecongg, thank you for pointing out that. I remember when constructing the d4j ids, I only get all ids that with type "replace", so hunks with other types are excluded. For convenience, I didn't exclude ids that are incomplete to fix a bug and just translate them all. When applying patches to d4j projects, I first check if the bug can be fix (all id hunks have generated patches), if the bug can't be fixed, I just skip the validation for this bug. Yes, the actual validated bugs should be 191 but not 260 (but I forget to check this). And some identical multi-hunk bugs such as Cli_31 and Gson_4 should be partial-fixed and I will correct the data. Very much appreciate it!
@thanhlecongg , by the way, my data-process way of defects4j is not good, since some NPR systems can also handle other type of bugs beyond "one-line replacement bugs". So now I find a perfect way to parse d4j: prepare block-level hunk rather than line-level hunk. Then nearly all bugs of Defects4j can be parsed into forms that can be translated by NPR systems. I shared the block-level file in https://drive.google.com/drive/folders/14sPk9WM2oHEklS7Na4G_PYSeKNEGqqxc. Hope this can help you.
Thanks you for your kind explaination and sharing. They help me a lot.
Hi, I just found a problem with your data in the bug 'JacksonDatabind-60'. In your dataset, there is only one changed hunk, i.e. 61a8cca58009e7c4a5d3d60b. However, when I manually compare the fix version to the buggy version, I found that there are much more added code than the hunk. As a results, when I patch the buggy version using your ground truth, the program even cannot be compiled as the object TypeSerializerRerouter, which appear in the fixed code, do not exists in the buggy version. Similar to this cases, I found the following bugs facing the same problem: ['JacksonDatabind-60', 'Compress-34', 'Compress-42', 'Mockito-30', 'Compress-43', 'Mockito-21', 'Closure-97', 'Mockito-31', 'Closure-16', 'JacksonDatabind-110', 'Mockito-10', 'Math-66', 'Gson-3', 'JxPath-11', 'Closure-64', 'Lang-46', 'Cli-10', 'Closure-3', 'Compress-39', 'Mockito-32', 'Math-15', 'Closure-127', 'JacksonDatabind-75', 'Mockito-19', 'Time-10'] Could you please kindly check these cases and advice? Many thanks.
Besides, we also found other problems when testing bugs Lang-4. Particularly, in your meta_data, I only can found only one changed hunk, i.e.,
"_id": "D:\\DDPR_DATA\\Defects4j\\BF_Rename/Lang_4_LookupTranslator.buggy@public int translate(final CharSequence input, final int index, final Writer out) throws IOException",
"methodname": "translate",
"commitID": "defects4j_Lang_4_LookupTranslator",
"BLine_buggy": 68,
"Bline_fix": 68,
"ELine_buggy": 84,
"Eline_fix": 84,
"buggy_file": "D:\\DDPR_DATA\\Defects4j\\BF_Rename/Lang_4_LookupTranslator.buggy",
"fix_file": "D:\\DDPR_DATA\\Defects4j\\BF_Rename/Lang_4_LookupTranslator.fix"
However, following defects4j-dissection, Lang-4 should be fixed in three hunks. As a results, when I patch the buggy version using your ground truth, the program still fail on the test cases. Note that, all three hunks are one-line replacement fixes so I think the missing hunks will not be removed in your preprocessing. I wonder if your metadata is missing or I'm misunderstanding sth. Could you also check the case? Many thanks.
Hi @thanhlecongg, many thanks to your questions! You mentioned two problems: (1) patches of some multi-hunk bugs are not correct. Yes, you're right. We found we made a mistake when processing the data. As a result, we wrongly recognize some multi-hunk bugs as one-hunk bugs. And when evaluating patches, we first check if the patch is identical to the developer-patch (if so, we do not run the test cases for this bug ). So some patches are wrongly labeled as correct. The current versioin of evaluate_results.zip contains some mistakes. The good news is, after this problem found, we have re-run and re-evaluate the experiment considering more NPR systems and more candidates (up to 300), the latest manual-check results can be found here: https://docs.google.com/spreadsheets/d/11oUYyEiMnDfHRONSrB9hY1smXcrroJSN/edit?usp=sharing&ouid=116802316915888919937&rtpof=true&sd=true The result in this sheet should be more accurate. (2) meta-data missing. To be honest, I'm not sure if there is a bug in my d4j pre-processing method (using JavaParser), I may need further check my codes. So the meta file may also contain some errors. For a more accurate meta file, please use: https://drive.google.com/file/d/1DLvu8NCdhzUHNWvUOG2ywlne3rqPOlkB/view?usp=sharing (we use block-level instead of the line-level parse).
Hope my answer could help. Again, thanks for your questions. I believe they are important to help me improve this framework and refine the eval results. So please feel free to contact me again if you find any other problem!
Many thanks for your kind explaination and advice. I really appreciate it.
Hi,
First, thank you very much for your useful framework and interesting paper.
I'm trying to use your framework to run APR techniques. However, I struggle on recovering the original code in Defects4J from predictions for running the test cases. In particular, I do not know how to map from current (bug, fix) pairs to original bugs Defects4J. Could you provide more instructions and metadata for this purpose. Many thanks.