salesforce / CodeRL

This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS22).
BSD 3-Clause "New" or "Revised" License
494 stars 61 forks source link

Critic Training pre-processing steps #47

Open xylankant opened 1 year ago

xylankant commented 1 year ago

Hello,

Thanks for making the code for this great project open source, this is really great!

We are using CodeRL as a really nice starting point for student projects, and there are some questions for understanding: In the "Critic Training" section, you say the following:

We can train a critic model as a classifier that predicts the test outcomes of generated samples. For each training sample, we can follow the prior processes (generating programs and running unit tests) to obtain synthetic samples and their annotations of unit test outcomes. On average, we generate 20 programs per training sample (we provided some example generated programs in data/APPS/train/).

ANSWER:

"""

class Solution(object): def reverse(self, n): """ :type n: int :rtype: int """ if n == 0: return -1 l = list(bin(n)) l.reverse() return sum(l)

if name == 'main': print Solution().reverse(int(raw_input()))

[...]

print(gen_data['0']['code'][2]) �� the answer.

ANSWER:

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear. [...]


- Is there some post-processing going on that we are overlooking?
Mucalinda2436 commented 7 months ago

Hello, I also have a question in this Section. I noticed that the generated programs and their evaluations are stored in the folder 'outputs/codes/' and 'outputs/test_results'. And they also said:"For each training sample, we can follow the prior processes (generating programs and running unit tests) to obtain synthetic samples and their annotations of unit test outcomes." But why they then use the data in 'data/APPS/train' to train the critic model? I've noticed that you were asking the question in the same Section, maybe you can answer my question, thanks a lot

xylankant commented 7 months ago

When training in critic mode, the dataset will load the generated solutions as well: see here in APPSBaseDataset. For this, you'll need to have the gen_solutions.json that contains solutions to training problems generated by the original model, which you can obtain by generating programs and running unit tests.

Hope this helps.

Mucalinda2436 commented 7 months ago

When training in critic mode, the dataset will load the generated solutions as well: see here in APPSBaseDataset. For this, you'll need to have the gen_solutions.json that contains solutions to training problems generated by the original model, which you can obtain by generating programs and running unit tests.

Hope this helps.

Thanks for helping! But after I've read the code in 'generating programs' and 'running unit tests', I noticed that they save the program generated by the actor-model in the file path 'outputs/codes/' and the evaluation results of these programs in the file path 'outputs/test_results'. So it seems that the 'gen_solutions.json' you mentioned under the path 'data/APPS/train/prob_path/' doesn't exist since they didn't save anything to this file in the 'generating programs' and 'running unit tests'. So I wonder is there any code I missed? Which is used for adding content in the file 'gen_solutions.json'? Thanks for your answering again🙏