Closed mmoya01 closed 3 years ago
Hi @mmoya01
You could fine-tune longformer2roberta
model using the EncoderDecoder
model class. patrickvonplaten/longformer2roberta-cnn_dailymail-fp16
is already fine-tuned on but as the model card says it was fine-tuned for just demo, so you should fine-tune a new longformer2roberta
. You could follow the training script given the model card or you can refer to this notebook
Also in your example, you are loading the longformer2roberta
model using LEDForConditionalGeneration
which doesn't seem right. It should be loaded using EncoderDecoderModel
hi @patil-suraj , thank you for the reply! So if I'm understanding this correctly, I would have to train a new longformer2roberta
from scratch? I was trying to avoid that because the model card mentions how it took 90 hours to fine tune roberta on cnn-daily news
The reason I was trying to use LEDForConditionalGeneration
is because I wanted to fine tune it where the pretrained model
was longformer2roberta
instead of allenai/longformer-base-4096
so, to fine tune longformer2roberta
model in the past, I tried pip installing the more_general_trainer_metric branch given the note about the trainer and then running
#!/usr/bin/env python3
import nlp
import logging
from nlp import arrow_dataset
from transformers import LongformerTokenizer, EncoderDecoderModel, Trainer, TrainingArguments
logging.basicConfig(level=logging.INFO)
model = EncoderDecoderModel.from_pretrained("patrickvonplaten/longformer2roberta-cnn_dailymail-fp16")
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
#load dataset
train_bytes = s3_client.get_object(train_uri)
train = pq.read_table(BytesIO(train_bytes),columns=['reference_summary','extractive_summary'])
test_bytes = s3_client.get_object(test_uri)
test = pq.read_table(BytesIO(test_bytes),columns=['reference_summary','extractive_summary'])
train_dataset = arrow_dataset.Dataset(train)
val_dataset = arrow_dataset.Dataset(test)
# enable gradient checkpointing for longformer encoder
model.encoder.config.gradient_checkpointing = True
# set decoding params
model.config.decoder_start_token_id = tokenizer.bos_token_id
model.config.eos_token_id = tokenizer.eos_token_id
model.config.max_length = 142
model.config.min_length = 56
model.config.no_repeat_ngram_size = 3
model.early_stopping = True
model.length_penalty = 2.0
model.num_beams = 4
encoder_length = 2048
decoder_length = 128*2
batch_size = 16
# map data correctly
def map_to_encoder_decoder_inputs(batch):
# Tokenizer will automatically set [BOS] <text> [EOS]
# cut off at Longformer at 2048
inputs = tokenizer(batch["extractive_summary"], padding="max_length", truncation=True, max_length=encoder_length)
# force summarization <= 256
outputs = tokenizer(batch["reference_summary"], padding="max_length", truncation=True, max_length=decoder_length)
batch["input_ids"] = inputs.input_ids
batch["attention_mask"] = inputs.attention_mask
# set 128 tokens to global attention
batch["global_attention_mask"] = [[1 if i < 128*2 else 0 for i in range(sequence_length)] for sequence_length in len(inputs.input_ids) * [encoder_length]]
batch["decoder_input_ids"] = outputs.input_ids
batch["labels"] = outputs.input_ids.copy()
# mask loss for padding
batch["labels"] = [
[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch["labels"]
]
batch["decoder_attention_mask"] = outputs.attention_mask
assert all([len(x) == encoder_length for x in inputs.input_ids])
assert all([len(x) == decoder_length for x in outputs.input_ids])
return batch
def compute_metrics(pred):
labels_ids = pred.label_ids
pred_ids = pred.predictions
# all unnecessary tokens are removed
pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
labels_ids[labels_ids == -100] = tokenizer.eos_token_id
label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)
rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid
return {
"rouge2_precision": round(rouge_output.precision, 4),
"rouge2_recall": round(rouge_output.recall, 4),
"rouge2_fmeasure": round(rouge_output.fmeasure, 4),
}
return {
"rouge2_precision": round(rouge_output.precision, 4),
"rouge2_recall": round(rouge_output.recall, 4),
"rouge2_fmeasure": round(rouge_output.fmeasure, 4),
}
# make train dataset ready
train_dataset = train_dataset.map(
map_to_encoder_decoder_inputs, batched=True, batch_size=batch_size, remove_columns=["extractive_summary", "reference_summary"],
)
train_dataset.set_format(
type="torch", columns=["input_ids", "attention_mask", "global_attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
)
# same for validation dataset
val_dataset = val_dataset.map(
map_to_encoder_decoder_inputs, batched=True, batch_size=batch_size, remove_columns=["extractive_summary", "reference_summary"],
)
val_dataset.set_format(
type="torch", columns=["input_ids", "global_attention_mask", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
)
# set training arguments - these params are not really tuned, feel free to change
training_args = TrainingArguments(
output_dir="./",
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
predict_from_generate=True,
evaluate_during_training=True,
do_train=True,
do_eval=True,
logging_steps=100,
save_steps=100,
eval_steps=100,
overwrite_output_dir=True,
warmup_steps=200,
save_total_limit=3,
fp16=False,
)
# instantiate trainer
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
# start training
trainer.train()
^but that gave me TypeError: forward() got an unexpected keyword argument 'head_mask'
because The EncoderDecoderModel
did not work with longformer whereas LEDForConditionalGeneration
does
but I'm gathering, it is not possible to fine tune the longfomer2roberta
like I can with patrickvonplaten/led-large-16384-pubmed
here right? I would have to fine tune/create my own longfomer2roberta
trained on cnn daily, then fine tune further with my train
data listed above right? If so, should I stay away from using a tokenizer/model that uses roberta-base
and instead use "allenai/led-base-16384"
(which I think uses BART as the base model)
Thank you for your feedback either way, I greatly appreciate it
Hey @mmoya01, you don't have to train it from scratch - you can "warm-start" the model from the pretrained checkpoints. This blog post gives an in-detail explanation on how to do so: https://huggingface.co/blog/warm-starting-encoder-decoder
Hi @patrickvonplaten thank you for your reply and the blog post. I was following your notebook and trying to adapt it to the longformer2roberta-cnn_dailymail-fp16 work using my own train_data
and val_data
. wondering, how could I warm-start from patrickvonplaten/longformer2roberta-cnn_dailymail-fp16
?
I noticed I was able to do
roberta2roberta = EncoderDecoderModel.from_encoder_decoder_pretrained("allenai/longformer-base-4096", "roberta-base")
But I would love to do something like
roberta2roberta = EncoderDecoderModel.from_encoder_decoder_pretrained("patrickvonplaten/longformer2roberta-cnn_dailymail-fp16")
or warm-start the longformer2roberta-cnn_dailymail-fp16
checkpoint if possible rather than warm-start from allenai/longformer-base-4096
? I'd greatly appreciate your feedback
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I was wondering if there was anyway to fine tune the
patrickvonplaten/longformer2roberta-cnn_dailymail-fp16
model instead ofpatrickvonplaten/led-large-16384-pubmed
? When I tried fine tuning it in the past I ran into theTypeError: forward() got an unexpected keyword argument 'head_mask'
issue given thatEncoderDecoderModel
wasn't intended for longformer. So I'm now trying to see if I can useLEDForConditionalGeneration
for it but I noticed when I try doing:I get strange results for that pretrained model
if I try using
LEDForConditionalGeneration
instead ofEncoderDecoderModel
for modelpatrickvonplaten/longformer2roberta-cnn_dailymail-fp16
. Is there something I'm missing? I'd greatly appreciate any feedback/help with this