Training script for GPT-CoNuT model does not work

lin-tan / CURE

For our ICSE21 paper "CURE: Code-Aware Neural Machine Translation for Automatic Program Repair" by Nan Jiang, Thibaud Lutellier, and Lin Tan

https://www.cs.purdue.edu/homes/lintan/publications/cure-icse21.pdf

Other

50 stars 17 forks source link

Training script for GPT-CoNuT model does not work #7

Open nashid opened 2 years ago

nashid commented 2 years ago

We are trying to train the GPT-CoNuT model. Following the instruction, we are trying to run the training script: src/trainer/gpt_conut_trainer.py.

However, the training fails here:

https://github.com/lin-tan/CURE/blob/master/src/trainer/gpt_conut_trainer.py#L22

    def __init__(self, train_loader, valid_loader, dictionary, gpt_file):
        gpt_loaded = torch.load(gpt_file)
        config = gpt_loaded['config']
        gpt_model = OpenAIGPTLMHeadModel(config).cuda()
        gpt_model.load_state_dict(gpt_loaded['model'])

In the very first step, this code is trying to load the model. Here, we are trying to train the model from scratch. So unless I am missing something, this does not seem correct. Can you share the artefact for training the model from scratch, please?

Looking forward to hear your feedback. Thanks in advance for the help.

msintaha commented 2 years ago

Hi @lin-tan and @jiang719 , I have a similar query. Can you also provide the config (used in OpenAIGPTLMHeadModel(config)) for training a new model from scratch?

Thank you.

nashid commented 2 years ago

@lin-tan and @jiang719 we loaded the OpenAIGPTLMHeadModel(config)) to train a new model from scratch.

This change looks like:

        - gpt_loaded = torch.load(gpt_file)
        - config = gpt_loaded['config']
        - gpt_model.load_state_dict(gpt_loaded['model'])
        + configuration = OpenAIGPTConfig()
        + gpt_model = OpenAIGPTLMHeadModel(configuration).cuda() if torch.cuda.is_available() else OpenAIGPTLMHeadModel(configuration)

We would really appreciate if you share the config you trained the model with as we dont want to compare cure with our approach incorrectly.

@jiang719 can you please help?

jiang719 commented 2 years ago

@msintaha @nashid I used things like: n_positions=1024, n_ctx=1024, n_embd=384, n_layer=8, n_head=6

You can also try other reasonable settings, as this is just empirically set.

Actually, the checkpoint contains the config information, you can get and print it like this:

gpt_loaded = torch.load(gpt_file)
config = gpt_loaded['config']

nashid commented 2 years ago

@jiang719 I cant find any model pushed to the repo. Where can I find the checkpoint?

nashid commented 2 years ago

@jiang719 can you please share the trained model or upload somewhere so that others can download?

HanJin996 commented 1 year ago

@nashid Hello! Could you train with gpt_conut_trainer.py and gpt_fconv_trainer.py now? When I train using these two scripts, after the second round of reads, the loss value changes to nan. i have tried many ways but none of them work. if you have successfully trained using these two scripts, can you share your scripts and training data with me? (Although I'm basically sure my training data is fine, I used them to train two other NPR models)