shuyanzhou / docprompting

Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
Apache License 2.0
232 stars 17 forks source link

The conala_docs.json files missing the function signature and usage. #11

Closed JiexingQi closed 1 year ago

JiexingQi commented 1 year ago

Hi, @shuyanzhou I find your doc content in conala_docs.json is not consistant with what you provided in data/conala/fid.cmd_dev.codet5.t10.json file, the file you provided missing the function signature and usage.

image

just like the above figure, the function

pandas.reference.api.pandas.dataframe.groupby

in conala_docs.json missing the first two line of content, and the content in the tail seems also missed.

Could you provided the fully content doc files, thanks a lot.

shuyanzhou commented 1 year ago

Thanks @JiexingQi, this is indeed a mistake I made when I uploaded the conala_docs.json. The file has been updated in google drive. Great catch!

You can also checkout the dataset in huggingface: https://huggingface.co/datasets/neulab/docprompting-conala/tree/main

The unique ID for each document is indicated by the man_id entry in fid.cmd_*.codet5.t10.json files. I removed unnecessary path such as .reference.api during preprocessing

JiexingQi commented 1 year ago

Thanks a lot.