Closed JiexingQi closed 1 year ago
Good question. The ORACLE setting in our work corresponds to both training and evaluating with oracle docs. The main reason is that we expect the distribution of the docs the Generator sees to be the same in training and evaluation. If the model is trained with retrieved docs, it might already learn to ignore some docs. Therefore, it would not be able to utilize all docs in the oracle settings even if they are all correct.
Thanks a lot.
Hi, @shuyanzhou , I try to reproduce your CodeT5 + Docprompting (ORACLE) in conala dataset. I find it is not perform well when directly test the dataset on the model which is trained not in Oracle setting. Did you have train another model use Oracle training data to get this performance(table 3 in your paper)? Looking forward for your reply.