Closed pierowu closed 4 months ago
Hi, pierowu! The Resyn27k.json is for instruction tuning and you could use mle.py to do the training. To run mle_scoring.py, you should first generate code candidates using your based model for each design sample and use a syntax checker to assign scores to them. You can find details in the paper. Hope this helps.
Hi, pierowu! The Resyn27k.json is for instruction tuning and you could use mle.py to do the training. To run mle_scoring.py, you should first generate code candidates using your based model for each design sample and use a syntax checker to assign scores to them. You can find details in the paper. Hope this helps.
Thanks. Could you share a copy with scores for researchers to reproduce precisely?
Hi, pierowu! The objective of scoring training is to make the model generate less low-quality data. Therefore, the code candidates should be generated by your base model or checkpoints at the tunning process. We recommend you generate the code candidates using your base model or checkpoints. Otherwise, the scoring training may even harm your model performance.
I try to use Resyn27k.json to run mle_scoring.py. But it seems that dataset lacks of score of response. Could you share more details about how to get scores? Here is my log: Traceback (most recent call last): File "/home/work/RTL-Coder/train/mle_scoring.py", line 296, in
train()
File "/home/work/RTL-Coder/train/mle_scoring.py", line 292, in train
trainer.train()
File "/home/work/transformers/src/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/work/transformers/src/transformers/trainer.py", line 1836, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/home-g2/anaconda3/envs/deep_seek2/lib/python3.9/site-packages/accelerate/data_loader.py", line 451, in iter
current_batch = next(dataloader_iter)
File "/home-g2/anaconda3/envs/deep_seek2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/home-g2/anaconda3/envs/deep_seek2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home-g2/anaconda3/envs/deep_seek2/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/work/transformers/src/transformers/trainer_utils.py", line 772, in call
return self.data_collator(features)
File "/home/work/RTL-Coder/train/mle_scoring.py", line 141, in call
scores = ins['Score']
KeyError: 'Score'