Hi, for the CodeReviewer model data released on zenodo, none of the 3 tasks seem to have enough metadata to download the original repos. The proj key in the test (and validation) data seems to have something that looks like the github repo name, but that seems to be a fused version of the organization github account username and the actual repo name (e.g. GoogleCloudPlatform-compute-image-tools actually stands for https://github.com/GoogleCloudPlatform/compute-image-tools). Also, it is missing the commit hashes for the code review task, which would allow for downloading the correct versions of the files. For the train data there is no proj field that could be used to figure out the Github repos to which the instances belong.
How can we access the metadata to try and download the original github data corresponding to the data in the tasks?
P.S: What do the id fields signify? Are they the indexes pointing to records in some unreleased metadata?
Hi, for the CodeReviewer model data released on zenodo, none of the 3 tasks seem to have enough metadata to download the original repos. The
proj
key in the test (and validation) data seems to have something that looks like the github repo name, but that seems to be a fused version of the organization github account username and the actual repo name (e.g. GoogleCloudPlatform-compute-image-tools actually stands for https://github.com/GoogleCloudPlatform/compute-image-tools). Also, it is missing the commit hashes for the code review task, which would allow for downloading the correct versions of the files. For the train data there is noproj
field that could be used to figure out the Github repos to which the instances belong.How can we access the metadata to try and download the original github data corresponding to the data in the tasks?
P.S: What do the
id
fields signify? Are they the indexes pointing to records in some unreleased metadata?