Closed Kiki2049 closed 1 year ago
Hi,
Thanks for your interest.
In your script, the result is only for "part of the dataset", i.e., 400_800. (to speed up we previously split the dataset into multiple chunks).
You may want to obtain the results on the full dataset.
Thanks for your answer.
I understand that you have chunked your dataset to run faster. And I also used the chunked data provided in the repositories to conduct all tests, and the results are as follows:
# 0_400
Example time cost: 0.0 min
ALL examples time cost: 42.73 min
Query times in this attack: 1
All Query times: 193268
Success rate: 0.5905172413793104
Successful items count: 137
Total count: 232
Index: 399
# 400_800
Example time cost: 0.0 min
ALL examples time cost: 71.4 min
Query times in this attack: 1
All Query times: 206987
Success rate: 0.6758893280632411
Successful items count: 171
Total count: 253
Index: 399
# 800_1200
Example time cost: 0.0 min
ALL examples time cost: 48.63 min
Query times in this attack: 1
All Query times: 142842
Success rate: 0.7283464566929134
Successful items count: 185
Total count: 254
Index: 399
# 1200_1600
>> ACC! i => ori (0.53700 => 0.52459)
>> SUC! B => U (0.52459 => 0.49986)
Example time cost: 0.04 min
ALL examples time cost: 56.49 min
Query times in this attack: 138
All Query times: 188646
Success rate: 0.6175298804780877
Successful items count: 155
Total count: 251
Index: 399
# 1600_2000
Example time cost: 0.56 min
ALL examples time cost: 45.47 min
Query times in this attack: 2586
All Query times: 166986
Success rate: 0.6370967741935484
Successful items count: 158
Total count: 248
Index: 399
# 2000_2400
Example time cost: 0.07 min
ALL examples time cost: 57.91 min
Query times in this attack: 200
All Query times: 167304
Success rate: 0.6436781609195402
Successful items count: 168
Total count: 261
Index: 399
# 2400_2800
Example time cost: 0.05 min
ALL examples time cost: 49.66 min
Query times in this attack: 194
All Query times: 145886
Success rate: 0.6666666666666666
Successful items count: 146
Total count: 219
Index: 331
This result seems to be better than the log in the dataset_and_results.zip
provided by you. So I'm worried if I'm doing something wrong.
A noteworthy point is that in README, the parameters used by fine_tune is train_data_file=../preprocess/dataset/adv_train.jsonl. I'd venture to guess that this is a typo, so train_data_file=../preprocess/dataset/train.jsonl is used.
One possibility is that the genetic algorithm, which is random by nature, may lead to different results in each run, and even on different machines.
Thanks for your answer.
I use the Defect-detection in CodeXGLUE to attack fine-tuning CodeBert model according to README , the parameters are similar to:
I completed the test on 2732 sets of test data, but it was found that ASR was higher than the data in the report: 53.62% in the report, and I got about 65.19%. The calculation method is to use the Successful items count / Total count in the log.
I'm wondering if this is a bug in my process or some experimental error. Thanks!