microsoft / CodeBERT

CodeBERT
MIT License
2.19k stars 450 forks source link

CodeReviewer: Metadata for downloading github repos #299

Closed atharva-naik closed 11 months ago

atharva-naik commented 11 months ago

Hi, for the CodeReviewer model data released on zenodo, none of the 3 tasks seem to have enough metadata to download the original repos. The proj key in the test (and validation) data seems to have something that looks like the github repo name, but that seems to be a fused version of the organization github account username and the actual repo name (e.g. GoogleCloudPlatform-compute-image-tools actually stands for https://github.com/GoogleCloudPlatform/compute-image-tools). Also, it is missing the commit hashes for the code review task, which would allow for downloading the correct versions of the files. For the train data there is no proj field that could be used to figure out the Github repos to which the instances belong.

How can we access the metadata to try and download the original github data corresponding to the data in the tasks?

P.S: What do the id fields signify? Are they the indexes pointing to records in some unreleased metadata?

celbree commented 11 months ago

Sorry, but we didn't save the metadata of these projects.

atharva-naik commented 11 months ago

Oh, I see. Thank you for responding!