huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.17k stars 27.05k forks source link

About Summarization #2139

Closed lcl6679292 closed 4 years ago

lcl6679292 commented 4 years ago

❓ Questions & Help

Thank you very much for your wonderful work. I found that some new code for summarization has been added from "pretrained encoder" paper. However, I see only the evaluation part of the code. I want to ask if you will add the code for the training part. Thank you very much!

TheEdoardo93 commented 4 years ago

If you want to look the source code used for training the model, you can look at the source GitHub, in particular you can view the src/train.py, src/train_abstractive.py or src/train_extractive.py Python scripts.

lcl6679292 commented 4 years ago

@TheEdoardo93 Thank you for your reply. I know, will you plan to integrate the source training code into transformers? It is more convenient to use your transformers code for training.

TheEdoardo93 commented 4 years ago

At the moment, I think that it is not on the roadmap. Do you have a particular reason for asking to integrate the training algorithm into this library?

@TheEdoardo93 Thank you for your reply. I know, will you plan to integrate the source training code into transformers? It is more convenient to use your transformers code for training.

lcl6679292 commented 4 years ago

@TheEdoardo93 I think this is a good encoder-decoder framework based on BERT. In addition to the summary task, it can also do many other generation tasks. If the training code can be integrated into this library, it can be used to finetune more downstream generation tasks. I think this library currently lacks downstream fine-tuning for NLG tasks, such like query generation, generative reading comprehension and other summarization tasks.

yxlin1 commented 4 years ago

Thanks for the help. How do I load the checkpoints model_step_20000.pt that was trained from src/train.py to replace model= BertAbs.from_pretrained("bertabs-finetuned-cnndm")

If you want to look the source code used for training the model, you can look at the source GitHub, in particular you can view the src/train.py, src/train_abstractive.py or src/train_extractive.py Python scripts.

TheEdoardo93 commented 4 years ago

Hello! As I know, you can't load a PyTorch checkpoint directly in BertAbs model, you'll indeed get an error. A PyTorch checkpoint typically contains the model state dict. Therefore, you can try to use the following source code for your task:

> import transformers
> import torch
> from transformers import BertTokenizer
> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
> from modeling_bertabs import BertAbs
> model = BertAbs.from_pretrained('bertabs-finetuned-cnndm')
> model.load_state_dict(torch.load(PATH_TO_PT_CHECKPOINT))

where _PATH_TO_PTCHECKPOINT could be e.g. _./input_checkpoints/model_step20000.pt. N.B: this code would work only in the case where the architecture of bertabs-finetuned-cnndm model is equal to the one you're trying to load into, otherwise an error occur!

If this code doesn't work as expected, we can work together in order to solve your problem :)

Thanks for the help. How do I load the checkpoints model_step_20000.pt that was trained from src/train.py to replace model= BertAbs.from_pretrained("bertabs-finetuned-cnndm")

If you want to look the source code used for training the model, you can look at the source GitHub, in particular you can view the src/train.py, src/train_abstractive.py or src/train_extractive.py Python scripts.

shashankMadan-designEsthetics commented 4 years ago

Its Important!! ADD IT.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Bhavya6187 commented 4 years ago

@TheEdoardo93 is there any way to load a pretrained model with different architecture? I used the source library to train a model with source embedding size of 1024 instead of 512 as in the pretrained one as 512 was too small for my data.