Closed EKebriaei closed 3 years ago
@patil-suraj or @patrickvonplaten can chime in if I'm wrong, but I believe we currently only have fine-tuning & distillation schemes for the BART-family models, no pre-training.
Hey @EKebriaei - yeah we sadly don't have any pre-training notebooks for pegasus yet. Are you looking for the summary specific pre-training of pegasus or just the BART-like denoising pre-training?
Hey @EKebriaei - yeah we sadly don't have any pre-training notebooks for pegasus yet. Are you looking for the summary specific pre-training of pegasus or just the BART-like denoising pre-training?
I want to pre-train pegasus on a language other than English.
Yeah, we don't have a script or good documentation for this yet.
cc https://github.com/huggingface/transformers/issues/8594#issuecomment-731248819
Yeah, we don't have a script or good documentation for this yet.
I have some dependency problems when compiling this: https://github.com/google-research/pegasus/blob/master/pegasus/ops/pretrain_parsing_ops.cc Do you have any comments that help?
This PR will enable a pretraining script: #8731
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.
If you think this still needs to be addressed please comment on this thread.
Yeah, we don't have a script or good documentation for this yet.
Could we follow the same approach you (@patrickvonplaten) provided here to pretrain BART for PEGASUS ? PEGASUS has also a GSG training objective on top of the BART-like denoising as detailed in the original paper.
The GSG work by masking the most important sentences according to ROUGE then the target are the missing sentences.
So my attempt by changing your code would be:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, PegasusConfig
tok = PegasusTokenizer.from_pretrained("google/pegasus")
model = PegasusForConditionalGeneration(PegasusConfig())
input_string = ["Pegasus is <mask_2> . <mask_1> it <mask_2> the model ."
decoder_input_string = "<s> It is pure white ."
labels_string = "It is pure white . <eos>"
input_ids = tok(input_string, add_special_tokens=False, return_tensors="pt").input_ids
decoder_input_ids =tok(decoder_input_string, add_special_tokens=False, return_tensors="pt").input_ids
labels = tok(labels_string, add_special_tokens=False, return_tensors="pt").input_ids
loss = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels)[0]
Does this look reasonable (the selection strategy of masked sentences will naturally need to be implemented)? @patrickvonplaten
@Skylixia - yes this looks reasonable to me! I guess in the original PEGASUS paper another masking loss was added on top of the encoder to predict the
Hi. I've been struggling with a pretty simple issue trying to get the above code to work.
Essentially, the Pegasus tokenizer's eos is </s>
(not <eos>
as mentioned above) and it does not seem to have a bos symbol. So no matter what combination I try, I keep getting a ValueError as the lengths of the label and decoder inputs don't match.
I tried to follow what happens in BART, but the following does not work:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'google/pegasus-xsum'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)
input_string = ["Pegasus is mythical . <mask_1> it names the model ."]
decoder_input_string = ["<s>It is pure white . "]
labels_string = ["It is pure white .</s>"]
input_ids = tokenizer(input_string, add_special_tokens=False, return_tensors="pt").input_ids
decoder_input_ids = tokenizer(decoder_input_string, add_special_tokens=False, return_tensors="pt").input_ids
labels = tokenizer(labels_string, add_special_tokens=False, return_tensors="pt").input_ids
loss = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels)[0]
If I try to run this, I get Expected input batch_size (10) to match target batch_size (7).
Complete stack trace:
---> 15 loss = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels)[0]
16 # for _ in range(1_000):
17 # loss = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels)[0]
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/transformers/models/pegasus/modeling_pegasus.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1285 if labels is not None:
1286 loss_fct = CrossEntropyLoss()
-> 1287 masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))
1288
1289 if not return_dict:
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
959
960 def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 961 return F.cross_entropy(input, target, weight=self.weight,
962 ignore_index=self.ignore_index, reduction=self.reduction)
963
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2466 if size_average is not None or reduce is not None:
2467 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2468 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2469
2470
/home/ubuntu/anaconda3/envs/pytorch_new/lib/python3.8/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2259
2260 if input.size(0) != target.size(0):
-> 2261 raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
2262 .format(input.size(0), target.size(0)))
2263 if dim == 2:
ValueError: Expected input batch_size (10) to match target batch_size (7).
I have opened a new issue with complete detail (and a corrected example) here: https://github.com/huggingface/transformers/issues/11541
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Yeah, we don't have a script or good documentation for this yet.
@patrickvonplaten Any update on this? I am planning on researching abstractive summarization in a non-English language and the PEGASUS model seems to be a worthwhile model to pursue. It would be great if you could either direct me to any resources or suggest another model to pursue in my project. Thanks!
Yeah, we don't have a script or good documentation for this yet. cc #8594 (comment)
@patrickvonplaten Any update on this? I am planning on researching abstractive summarization in a non-English language and the PEGASUS model seems to be a worthwhile model to pursue. It would be great if you could either direct me to any resources or suggest another model to pursue in my project. Thanks!
@ParthParikh04 Did you figure out a solution to this?
Yeah, we don't have a script or good documentation for this yet. cc #8594 (comment)
@patrickvonplaten Any update on this? I am planning on researching abstractive summarization in a non-English language and the PEGASUS model seems to be a worthwhile model to pursue. It would be great if you could either direct me to any resources or suggest another model to pursue in my project. Thanks!
@ParthParikh04 Did you figure out a solution to this?
Nope, unfortunately not. Please let me know if you end up finding a solution though!
I want to pre-train PEGASUS model from scratch on a language other than English. Is there any way to do this using huggingace API's? The source code released by the authors is complicated in use to pre-train. Also little documentation available to do this.