microsoft / CodeBERT

CodeBERT
MIT License
2.15k stars 442 forks source link

Some problems with test dataset in code search #225

Closed Mr-Loevan closed 1 year ago

Mr-Loevan commented 1 year ago

In test phase, I note that testing the original datasets such as "batch_0.txt" shows normal MRR. However, when I shuffle batch_0.txt and re-test on this dataset (Cached Deleted), MRR is extremely low. Is there any restriction on test dataset sequence?

If I want to extend dataset, can I just append data into original test datasets?

guoday commented 1 year ago

please refer to https://github.com/microsoft/CodeBERT/issues/215#issuecomment-1427997821