Open nashid opened 2 years ago
Hi @lin-tan and @jiang719 , I have a similar query. Can you also provide the config (used in OpenAIGPTLMHeadModel(config)
) for training a new model from scratch?
Thank you.
@lin-tan and @jiang719 we loaded the OpenAIGPTLMHeadModel(config)) to train a new model from scratch.
This change looks like:
- gpt_loaded = torch.load(gpt_file)
- config = gpt_loaded['config']
- gpt_model.load_state_dict(gpt_loaded['model'])
+ configuration = OpenAIGPTConfig()
+ gpt_model = OpenAIGPTLMHeadModel(configuration).cuda() if torch.cuda.is_available() else OpenAIGPTLMHeadModel(configuration)
We would really appreciate if you share the config you trained the model with as we dont want to compare cure with our approach incorrectly.
@jiang719 can you please help?
@msintaha @nashid I used things like:
n_positions=1024, n_ctx=1024, n_embd=384, n_layer=8, n_head=6
You can also try other reasonable settings, as this is just empirically set.
Actually, the checkpoint contains the config information, you can get and print it like this:
gpt_loaded = torch.load(gpt_file)
config = gpt_loaded['config']
@jiang719 I cant find any model pushed to the repo. Where can I find the checkpoint?
@jiang719 can you please share the trained model or upload somewhere so that others can download?
@nashid Hello! Could you train with gpt_conut_trainer.py and gpt_fconv_trainer.py now? When I train using these two scripts, after the second round of reads, the loss value changes to nan. i have tried many ways but none of them work. if you have successfully trained using these two scripts, can you share your scripts and training data with me? (Although I'm basically sure my training data is fine, I used them to train two other NPR models)
We are trying to train the GPT-CoNuT model. Following the instruction, we are trying to run the training script: src/trainer/gpt_conut_trainer.py.
However, the training fails here:
https://github.com/lin-tan/CURE/blob/master/src/trainer/gpt_conut_trainer.py#L22
In the very first step, this code is trying to load the model. Here, we are trying to train the model from scratch. So unless I am missing something, this does not seem correct. Can you share the artefact for training the model from scratch, please?
Looking forward to hear your feedback. Thanks in advance for the help.