Closed anishthite closed 4 years ago
Are you using this exact line
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert
If yes, then please use paths for your saved model. Few other things to try: verify data pipeline, try using beam search or sampling in generate
Thanks! I am using that exact line. I saved my trained model using save_pretrained() and it saved everything as one file. How would I separate this, or should I just retrain and re-save the encoder and decoder separately? Also, does the untrained model not work due to the untrained cross attention layer?
If you saved your model using .save_pretrained
then you can load it using just .from_pretrained
as you load any other HF model. Just pass the path of your saved model. You won't need to use .from_encoder_decoder_pretrained
Hi @anishthite,
How did you train your Bert2Bert model? Can you post the code you used to train your model here? Dontt worry if it's a very long code snippet :-)
Hello! I managed to figure out the issue. I retrained and saved the encoder and decoder in their own folders. I then was able to load it in as @patil-suraj suggested. I guess earlier it was loading in the untrained model. Would it be helpful to redefine save_pretrained() for EncoderDecoder models to automatically split it into an encoder and decoder folder I can submit a PR if you want.
dataset = QADataset(dataset=args.traindataset, block_size=args.maxseqlen)
qa_loader = DataLoader(dataset, batch_size=args.batch, shuffle=True)
model.train()
optimizer = AdamW(model.parameters(), lr=LEARNING_RATE)
t_total = len(qa_loader) // args.gradient_acums * args.epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=WARMUP_STEPS, num_training_steps = t_total)
proc_seq_count = 0
sum_loss = 0.0
batch_count = 0
models_folder = "combinerslargeencoder"
models_folder2 = "combinerslargedecoder"
if not os.path.exists(models_folder):
os.mkdir(models_folder)
if not os.path.exists(models_folder2):
os.mkdir(models_folder2)
for epoch in range(args.epochs):
print(f"EPOCH {epoch} started" + '=' * 30)
for idx,qa in enumerate(qa_loader):
print(str(idx) + ' ' + str(len(qa_loader)))
inputs, labels = (qa[0], qa[1])
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(input_ids=inputs, decoder_input_ids=labels, lm_labels=labels)
loss, logits = outputs[:2]
loss = loss / args.gradient_acums
loss.backward()
sum_loss = sum_loss + loss.detach().data
#proc_seq_count = proc_seq_count + 1
#if proc_seq_count == args.gradient_acums:
# proc_seq_count = 0
batch_count += 1
if (idx + 1) % args.gradient_acums == 0:
optimizer.step()
scheduler.step()
optimizer.zero_grad()
model.zero_grad()
if batch_count == 100:
print(f"sum loss {sum_loss}")
batch_count = 0
sum_loss = 0.0
# Store the model after each epoch to compare the performance of them
torch.save(model.state_dict(), os.path.join(models_folder, f"combined_mymodel_{args.maxseqlen}{epoch}{args.gradient_acums}.pt"))
model.save_pretrained(models_folder)
model.encoder.save_pretrained(models_folder)
model.decoder.save_pretrained(models_folder2)
evaluate(args, model, tokenizer)
Why do you save the encoder and decoder model seperately?:
model.encoder.save_pretrained(models_folder)
model.decoder.save_pretrained(models_folder2)
This line:
model.save_pretrained(models_folder)
should be enough.
We moved away from saving the model to two separate folders, see: https://github.com/huggingface/transformers/pull/3383. Also the docs: https://huggingface.co/transformers/model_doc/encoderdecoder.html might be useful.
Hello! I tried to train a Bert2Bert model for QA generation, however when I try the generate function it returns gibberish. I also tried using the example code below, and that also generated gibberish(the output is "[PAD] leon leon leon leon leonieieieieie shall shall shall shall shall shall shall shall shall"). Is the generate function supposed to work for EncoderDecoder models, and what am I doing wrong?