shuyanzhou / docprompting

Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
Apache License 2.0
230 stars 17 forks source link

Curious about the ORACLE Docprompting setting. #10

Closed JiexingQi closed 1 year ago

JiexingQi commented 1 year ago

Hi, @shuyanzhou , I try to reproduce your CodeT5 + Docprompting (ORACLE) in conala dataset. I find it is not perform well when directly test the dataset on the model which is trained not in Oracle setting. Did you have train another model use Oracle training data to get this performance(table 3 in your paper)? Looking forward for your reply.

shuyanzhou commented 1 year ago

Good question. The ORACLE setting in our work corresponds to both training and evaluating with oracle docs. The main reason is that we expect the distribution of the docs the Generator sees to be the same in training and evaluation. If the model is trained with retrieved docs, it might already learn to ignore some docs. Therefore, it would not be able to utilize all docs in the oracle settings even if they are all correct.

JiexingQi commented 1 year ago

Thanks a lot.