ntunlp / xCodeEval

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
MIT License
71 stars 6 forks source link

[Retrieval] request to release the idx (idx in corpus) of each candidate code. #10

Open izhx opened 8 months ago

izhx commented 8 months ago

Currently, the positive_code and negative_code provide the code string. Could you release the corresponding corpus idx of each candidate? And, if possible, with the corresponding corpus filename.

This would be helpful for using this data in tools like beir.

Thanks.


now (such as retrieval_code_code/validation/Java_code_code_dev_file.jsonl) :

    "positive_code": [
        {"source_code": "xxxx"}, 
        ....
    ],
    "negative_code": [
        {"source_code": "yyyy"}, 
        ....
    ],

expected:

    "positive_code": [
        {"idx": "x", "file_name": "java.jsonl", "source_code": "xxxx"}, 
        ....
    ],
    "negative_code": [
        {"idx": "y", "file_name": "java.jsonl", "source_code": "yyyy"}, 
        ....
    ],
izhx commented 8 months ago

BTW, this benchmark is excellent work.

Jackal1586 commented 8 months ago

Thanks for the suggestion, we are looking into the possibility of making changes. Please stay with us for few days.