huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.02k stars 27.02k forks source link

How to use fine-tuned BART for prediction? #3853

Closed riacheruvu closed 4 years ago

riacheruvu commented 4 years ago

❓ Questions & Help

Details

I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).

I have two questions on using this model for prediction:

Thank you for your time!

prabalbansal commented 4 years ago

Facing a similar type of issue for T5. @sshleifer

sshleifer commented 4 years ago

The last ckpt file should be loaded into a pl.LightningModule if the --do_predict flag is specified.

There is a bug on master that messes up the loading, but it's fixed in #3866

To use that code immediately, you can run:

git fetch
git checkout examples-summ-do-predict

then your same finetune.py command with --do_predict (and not --do_train) and the proper --output_dir.

Would love to know if that works!

cc: @ethanjperez.

sshleifer commented 4 years ago

Change is on master, let me know if this solves the problem!

prabalbansal commented 4 years ago

Config.json is still not generated while training.

sshleifer commented 4 years ago
    def log_hyperparams(model: pl.LightningModule):
        model.config.save_pretrained(model.hparams.output_dir)
        with open(os.path.join(model.hparams.output_dir, "hparam.json")) as f:
            json.dump(model.hparams, f)

You can call this somewhere in your code, if that's helpful.

riacheruvu commented 4 years ago

@sshleifer, thank you - I can run ./run_train.sh with the --predict() option successfully.

Regarding my original question, could you please specify how to load the checkpoint into the LighteningModule?

After inspecting transformer_base.py, I think hparams is equivalent to the arguments provided in run_train.sh, so a separate hparams.json file does not need to be generated. Please correct me if I'm wrong.

I am receiving the following error with my current code:

pytorch_lightning.utilities.exceptions.MisconfigurationException: Checkpoint contains hyperparameters but LightningModule's __init__ is missing the argument 'hparams'. Are you loading the correct checkpoint?

I've been using the following code, based on the discussion in https://github.com/PyTorchLightning/pytorch-lightning/issues/525 and https://pytorch-lightning.readthedocs.io/en/latest/weights_loading.html:



# load model
import pytorch_lightning as pl

from argparse import Namespace

# usually these come from command line args
args = Namespace(data_dir='CE_data/',
model_type='bart',
model_name_or_path='bart-large',
learning_rate='3e-5',
train_batch_size=4,
eval_batch_size=4,
output_dir='transformers/examples/summarization/bart/bart_sum',
do_predict='do_predict')

pretrained_model = pl.LightningModule.load_from_checkpoint('bart_sum/checkpointepoch=2.ckpt', hparams=args)
pretrained_model.eval()

# or for prediction
out = model(inputs['input_ids'])
print(out)
``'

Thank you for your time.
sshleifer commented 4 years ago

Seems close to correct.

https://github.com/huggingface/transformers/blob/7d40901ce3ad9e1c79fd9bb117f5b84bff42c33f/examples/summarization/bart/finetune.py#L164-L175

is how we do it @riacheruvu

prabalbansal commented 4 years ago

@sshleifer

  1. Originally config.json is not created which is a requirement for prediction using fine-tuned model. *As shown in the screenshot, I add this code in transformer_base.py in end, config and hparam files are created.
    • Then try to predict with --do_predict, then it gives, ""We assumed '/content/t5' was a path, a model identifier, or url to a directory containing vocabulary files named ['spiece.model'] but couldn't find such vocabulary files at this path or url."" What are the requirements to use fine-tuned model? Screenshot 2020-04-21 at 5 50 10 PM

  1. To predict for a single instance using the fine-tuned model, do I need to specify the test.target file also. I want to predict unknown instance without calculating the loss value.
riacheruvu commented 4 years ago

@sshleifer, thank you. I've got to the point where I can load the model and generate "outputs" using the forward() function, but I can't decode the outputs - using tokenizer.decoder() results in an error. Should I be using model.generate() instead of model.forward()? If so, it seems SummarizationTrainer does not support model.generate?

Revised code:

        tokenizer = BartTokenizer.from_pretrained('bart-large-cnn')
        ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
        inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')['input_ids']
        checkpoints = list(sorted(glob.glob(os.path.join(args.output_dir, "checkpointepoch=*.ckpt"), recursive=True)))
        model = model.load_from_checkpoint(checkpoints[-1])
        model.eval()
        model.freeze()
        outputs = model(inputs)
        print(outputs) #Successfully prints two 3D tensors in a tuple
        #print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs]) #Results in ValueError: only one element tensors can be converted to Python scalars
        print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs[0][0]])

The error I'm encountering

Traceback (most recent call last):
  File "finetune.py", line 194, in <module>
    main(args)
  File "finetune.py", line 184, in main
    print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs[1][0]])
  File "finetune.py", line 184, in <listcomp>
    print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs[1][0]])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 2141, in decode
    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/tokenization_gpt2.py", line 235, in convert_tokens_to_string
    text = "".join(tokens)
TypeError: sequence item 0: expected str instance, NoneType found
riacheruvu commented 4 years ago

I found a solution. The model.generate() function is necessary to extract the predictions. I defined a separate function in the SummarizationTrainer() class to use self.model.generate(), and was able to use tokenizer.decoder() on the outputs.

I was encountering issues when using self.tokenizer, so I assume using 'bart-large-cnn' tokenizer for similar custom summarization datasets is okay.

@prabalbansal, I'm not sure if the same method will apply to T5, but it could work for predicting for a single instance, per one of your questions.

My code is below:

    def text_predictions(self, input_ids):
        generated_ids = self.model.generate(
            input_ids=input_ids,
            num_beams=1,
            max_length=80,
            repetition_penalty=2.5,
            length_penalty=1.0,
            early_stopping=True,
        )
        preds = [
            self.tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True)
            for g in generated_ids
        ]
        return preds
...
    # Optionally, predict on dev set and write to output_dir
    if args.do_predict:
        # See https://github.com/huggingface/transformers/issues/3159
        # pl use this format to create a checkpoint:
        # https://github.com/PyTorchLightning/pytorch-lightning/blob/master\
        # /pytorch_lightning/callbacks/model_checkpoint.py#L169
        tokenizer = BartTokenizer.from_pretrained('bart-large-cnn')
        ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
        inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')['input_ids']
        checkpoints = list(sorted(glob.glob(os.path.join(args.output_dir, "checkpointepoch=*.ckpt"), recursive=True)))
        model = model.load_from_checkpoint(checkpoints[-1])
        model.eval()
        model.freeze()
        outputs = model.text_predictions(inputs)
        print(outputs)

Thank you for the help, @sshleifer !

prabalbansal commented 4 years ago

@riacheruvu Thank You. It works for T5 also.

sangeethabal15 commented 4 years ago

I followed the steps given in this thread and am still facing an issue. I get an error saying the below when I try to use my fine-tuned model for prediction.

OSError: Can't load '/home/bart/bart_1/checkpointepoch=3.ckpt'. Make sure that:

riacheruvu commented 4 years ago

@sangeethabal15, with my model, files were only generated up till the 2nd epoch. Just to confirm, do you have a checkpointepoch=3.ckpt file?

Are you using the load_from_checkpoint() function?

sangeethabal15 commented 4 years ago

@riacheruvu yes I do have checkpoint=3.ckpt file. I gave my own number of epochs instead of the default 3.

Yes I am using the load_from_checkpoint() function

riacheruvu commented 4 years ago

Ok. Could you share your code here, @sangeethabal15? It might be easier to help debug.

sangeethabal15 commented 4 years ago

@riacheruvu This is my modified code -

# Optionally, predict on dev set and write to output_dir
if args.do_predict:
    # See https://github.com/huggingface/transformers/issues/3159
    # pl use this format to create a checkpoint:
    # https://github.com/PyTorchLightning/pytorch-lightning/blob/master\
    # /pytorch_lightning/callbacks/model_checkpoint.py#L169
    examples = [" " + x.rstrip() for x in open("/home/bart/input/test.source").readlines()]
    fout = Path("output.txt").open("w")
    checkpoints = list(sorted(glob.glob(os.path.join(args.output_dir, "checkpointepoch=*.ckpt"), recursive=True)))
    model = model.load_from_checkpoint(checkpoints[-1])
    tokenizer = BartTokenizer.from_pretrained("bart-large")

    max_length = 80
    min_length = 5

    for batch in tqdm(list(chunks(examples, 8))):
        dct = tokenizer.batch_encode_plus(batch, max_length=1024, return_tensors="pt", pad_to_max_length=True)
        summaries = model.generate(
            input_ids=dct["input_ids"].to(device),
            attention_mask=dct["attention_mask"],
            num_beams=4,
            length_penalty=2.0,
            max_length=max_length + 2,  # +2 from original because we start at step=1 and stop before max_length
            min_length=min_length + 1,  # +1 from original because we start at step=1
            no_repeat_ngram_size=3,
            early_stopping=True,
            decoder_start_token_id=model.config.eos_token_id,
        )
        dec = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summaries]
        for hypothesis in dec:
            fout.write(hypothesis + "\n")
            fout.flush()
riacheruvu commented 4 years ago

Thank you, @sangeethabal15. From the error message you posted earlier, it seems load_from_checkpoint() is expecting a config.json file in the specified directory.

I have a few more debug questions:

sangeethabal15 commented 4 years ago

@riacheruvu

prabalbansal commented 4 years ago
import json
def log_hyperparams(model: pl.LightningModule):
    model.config.save_pretrained(model.hparams.output_dir)
    with open(os.path.join(model.hparams.output_dir, "hparam.json"),'w') as f:
        json.dump(model.hparams.__dict__, f)
if args.do_train:
    trainer.fit(model)
    log_hyperparams(model)

@sangeethabal15 Could you add this at the end of transformer_base.py. This works for me.

sangeethabal15 commented 4 years ago

@prabalbansal this is for when I am training my model. Since I have already fine-tuned my model, is there any workaround for test time when I am trying to predict my outputs?

murugeshmanthiramoorthi commented 4 years ago

@riacheruvu I am currently working on a Text Summarization problem. I have collected a small dataset of my own. Implementing BART is very easy. I can generate a great summary. But I want to know how to how to use BART model for training my own custom dataset. Can you please kindly help me with this?

I have browsed through internet. But I cannot any find any helpful resources as it is relatively new compared to other Transfer learning models.

sangeethabal15 commented 4 years ago

@murugeshmanthiramoorthi you can just use run_train.sh in the bart folder where you give in your parameters to run the fiinetune.py file

murugeshmanthiramoorthi commented 4 years ago

@sangeethabal15 Thank you so much for your reply mam. I am completely new to transfer learning mam. I can't get what you are upto. Can you kindly explain more elaborately or share a resource so that I can follow up? Thanks in advance mam.

murugeshmanthiramoorthi commented 4 years ago

@sangeethabal15 I somehow managed to load the dataset. I run the run_train.sh file. But it is showing me error "python3: can't open file 'finetune.py': [Errno 2] No such file or directory". I even tried changing the data set from my custom dataset to default CNN/daily news dataset. Still, I am getting the same error. Can anyone help me out?

sangeethabal15 commented 4 years ago

@riacheruvu @prabalbansal did y'all finetune Bart on your own dataset?

riacheruvu commented 4 years ago

@sangeethabal15, I fine-tuned BART on my own custom dataset. It's strange that your code runs successfully on the default number of epochs, but load_from_checkpoint() does not work with the 2nd epoch .ckpt file with the original configuration. Where did you modify the default number of epochs?

@murugeshmanthiramoorthi,

Per the instructions given in https://github.com/huggingface/transformers/tree/master/examples/summarization/bart:

The steps I followed are cloning the transformers repo, navigating to the examples/summarization/bart directory, copying over a folder containing the data files (train.target, train.source, val.target, val.source, test.target, and test.source files), and then modifying run_train.sh to use this folder for the data_dir and filling in the other parameters.

For your .source and .target files, you need to structure them similar to the CNN/DM dataset: The .source files should have an article on each line, and the .target files should have a target summary on each line (corresponding to the article in the .source file).

sangeethabal15 commented 4 years ago

@riacheruvu I noticed that I get this warning for both training and while testing

INFO:transformers.modeling_utils:Weights from pretrained model not used in BartForConditionalGeneration: ['encoder.version', 'decoder.version']

Seems like my model hasn't been trained properly. Any idea how to go about this?

Also, I have the number of epochs in my run_train.sh. It is defined in the add_specific_args in the transformer_base.py

sshleifer commented 4 years ago

that warning doesn't matter.

riacheruvu commented 4 years ago

@sangeethabal15, I agree that the warning does not matter as I saw that warning as well. It seems the issue might be when training the model with a different number of epochs compared to the default. @sshleifer, has the HuggingFace team tested the code with a different number of epochs before?

murugeshmanthiramoorthi commented 4 years ago

@riacheruvu Thank you so much for your help. But when I proceeded with those steps, I get the error

Traceback (most recent call last): File "finetune.py", line 10, in from transformer_base import BaseTransformer, add_generic_args, generic_train, get_linear_schedule_with_warmup ModuleNotFoundError: No module named 'transformer_base'

Do you have any idea solving this issue.

sangeethabal15 commented 4 years ago

@murugeshmanthiramoorthi Follow the below steps and you should be able to run your code.

Important To run the latest versions of the examples, you have to install from source and install some specific requirements for the examples. Execute the following steps in a new virtual environment:

git clone https://github.com/huggingface/transformers cd transformers pip install . pip install -r ./examples/requirements.txt

You can find the above in the readme section of https://github.com/huggingface/transformers/tree/cbbb3c43c55d2d93a156fc80bd12f31ecbac8520/examples

riacheruvu commented 4 years ago

@murugeshmanthiramoorthi, I agree with @sangeethabal15, I followed the same steps as well.

After installing the dependencies, the code should run without errors about transformer_base - I believe the following line in run_train.sh ensures that:

# Add parent directory to python path to access transformer_base.py export PYTHONPATH=“../../“:”${PYTHONPATH}”

sangeethabal15 commented 4 years ago

@sshleifer @riacheruvu I keep running into an error every time I change the beam size, define min_length, skip_ngram, length_penalty during decoding time. Here is a snippet of the error

Traceback (most recent call last):
  File "finetune1.py", line 189, in <module>
    main(args)
  File "finetune1.py", line 176, in main
    outputs = model.text_predictions(inputs)
  File "finetune1.py", line 80, in text_predictions
    length_penalty=1.0,
  File "/home/sangeethabal/.local/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 995, in generate
    attention_mask=attention_mask,
  File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1338, in _generate_beam_search
    past = self._reorder_cache(past, beam_idx)
  File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_bart.py", line 933, in _reorder_cache
    ((enc_out, enc_mask), decoder_cached_states) = past
ValueError: too many values to unpack (expected 2)

The function where I have defined all of this

def test(self, input_ids):
    generated_ids = self.model.generate(
        input_ids=input_ids,
        num_beams=6,
        max_length=60,
        min_length=4,
        no_repeat_ngram_size=3,
        length_penalty=1.0,
    )
    preds = [
        self.tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True)
        for g in generated_ids
    ]
    return preds

Any idea how to go about this?

riacheruvu commented 4 years ago

@sangeethabal15, I have two ideas: Try explicitly setting use_cache=True in the generate() function to see if it resolves the error. If that does not work, could you try specifying the attention_mask parameter? I'm looking at modeling_utils.py and modeling_bart.py, and I think these are the two parameters that are linked to this issue.

Edit: It also seems evaluate_cnn.py demonstrates a similar configuration for the generate() function, although the parameters are slightly different. If the two ideas above don't work, you could try using specifying those parameters to confirm it's not an issue with the values of the parameters that were chosen.

murugeshmanthiramoorthi commented 4 years ago

Thank you so much @sangeethabal15 @riacheruvu I got it. Thanks a ton for your help.

sangeethabal15 commented 4 years ago

@sshleifer when I use the exact same parameters as in the evaluate_cnn.py code` I still get the exact same error as below. There seems to be an issue with the values chosen for these parameters specified in evaluate_cnn.py

@riacheruvu I have tried the parameters you specified, same issue.

@sshleifer @riacheruvu I keep running into an error every time I change the beam size, define min_length, skip_ngram, length_penalty during decoding time. Here is a snippet of the error

Traceback (most recent call last):
  File "finetune1.py", line 189, in <module>
    main(args)
  File "finetune1.py", line 176, in main
    outp
@sshleifer @riacheruvu I keep running into an error every time I change the beam size, define min_length, skip_ngram, length_penalty during decoding time. Here is a snippet of the error

Traceback (most recent call last): File "finetune1.py", line 189, in main(args) File "finetune1.py", line 176, in main outputs = model.text_predictions(inputs) File "finetune1.py", line 80, in text_predictions length_penalty=1.0, File "/home/sangeethabal/.local/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 995, in generate attention_mask=attention_mask, File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1338, in _generate_beam_search past = self._reorder_cache(past, beam_idx) File "/home/sangeethabal/.local/lib/python3.7/site-packages/transformers/modeling_bart.py", line 933, in _reorder_cache ((enc_out, enc_mask), decoder_cached_states) = past ValueError: too many values to unpack (expected 2)


The function where I have defined all of this

def test(self, input_ids): generated_ids = self.model.generate( input_ids=input_ids, num_beams=6, max_length=60, min_length=4, no_repeat_ngram_size=3, length_penalty=1.0, ) preds = [ self.tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids ] return preds



Any idea how to go about this?
sshleifer commented 4 years ago

Try passing use_cache=True. Note that the call here works. Only differences appear to be attention_mask and use_cache.

sangeethabal15 commented 4 years ago

@sshleifer use_cache by default is set to true in the modeling_utils.py. But when I specify the parameter in my function and run the code it throws the following error

Traceback (most recent call last): File "finetune1.py", line 191, in main(args) File "finetune1.py", line 178, in main outputs = model.text_predictions(inputs) File "finetune1.py", line 82, in text_predictions use_cache=True, File "/home/sangeethabal/.local/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) TypeError: generate() got an unexpected keyword argument 'use_cache'

sshleifer commented 4 years ago

This isn't enough information for me to diagnose. My guess with the limited info I have is that you didn't run pip install -e . from transformers/.

What does pip freeze | grep transformers say?

sangeethabal15 commented 4 years ago

@sshleifer I did run pip install -e .

Here is the output of pip freeze | grep transformers

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
transformers==2.8.0
sshleifer commented 4 years ago

Ok, output should look like -e git+git@... try

git pull
pip install -e .

You should probably also upgrade pip, though that shouldn't matter much.

ArijRB commented 4 years ago

@riacheruvu hello , do you get in your generation output ?

riacheruvu commented 4 years ago

@ArijRB, hi - I don’t remember seeing that in the output of the model.

isabelcachola commented 4 years ago

@ArijRB I'm also getting <extra_id_x> generations. Were you able to solve that problem? I'm using a T5 model finetuned on my own dataset.

claudiatin commented 4 years ago

@riacheruvu How did you load the model in the line 'model.load_from_checkpoint(checkpoints[-1])' of the following code you posted?

    tokenizer = BartTokenizer.from_pretrained('bart-large-cnn')
    ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
    inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')['input_ids']
    checkpoints = list(sorted(glob.glob(os.path.join(args.output_dir, "checkpointepoch=*.ckpt"), recursive=True)))
    model = model.load_from_checkpoint(checkpoints[-1])
    model.eval()
    model.freeze()
    outputs = model(inputs)
    print(outputs) #Successfully prints two 3D tensors in a tuple
    #print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs]) #Results in ValueError: only one element tensors can be converted to Python scalars
    print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in outputs[0][0]])

Is 'model' an instance of pl.LightningModule? I still have the error message that you got in the previous post:

pytorch_lightning.utilities.exceptions.MisconfigurationException: Checkpoint contains hyperparameters but LightningModule's __init__ is missing the argument 'hparams'. Are you loading the correct checkpoint?

riacheruvu commented 4 years ago

@claudiatin, model should be defined as an instance of the Summarization trainer class. You will need to have the following code (which is already under main() in fine tune.py):

model = SummarizationTrainer(args)

I am wondering if there is an easier way to go about generating the predictions though. I’ve tried calling the Summarization trainer from another python file so I can separate my prediction and training code files, but ran into some issues, so I needed to stick with using another version of finetune.py running with a clone of the repo. If anyone finds an easier way of accomplishing this or if the HuggingFace team can build this functionality in, that would be great.

claudiatin commented 4 years ago

@riacheruvu Thank you so much for your answer. I did the same you did, and then I save the .bin file and config.json so I can use 'BartForConditionalGeneration.from_pretrained'. I don't know if it is the best way actually.

# model checkpoints and save the model
model = SummarizationTrainer(args)
model = model.load_from_checkpoint('bart_sum/checkpointepoch=2.ckpt')
torch.save(model.state_dict(), args.output_dir + '/pytorch_model.bin')
model.config.to_json_file(args.output_dir + '/config.json')

# load the fine-tuned model and predict
model = BartForConditionalGeneration.from_pretrained('bart_sum')
summarizer = pipeline('summarization', model=model, tokenizer=tokenizer)
summarizer(ARTICLE_TO_SUMMARIZE, max_length=80, min_length=40)
riacheruvu commented 4 years ago

@claudiatin, thank you!

Edit: Please ignore my previous response to your newest reply. I just went through the code again, and I was wrong about the inputs to the from_pretrained() function. I apologize for that.

I’ll try using the code block you provided!

riacheruvu commented 4 years ago

I tried applying the code provided for T5 (I haven't tried it with BART, but I think it'll work successfully per @claudiatin's response) - I am including the results here for documentation and if anyone knows the solution:

from transformers import T5Model, pipeline

model = T5Model.from_pretrained('tfive_sum')
summarizer = pipeline("summarization", model=model, tokenizer="t5-base", framework="tf")
summarizer(ARTICLE_TO_SUMMARIZE, min_length=5, max_length=20)

I run into the error:

AttributeError: You tried to generate sequences with a model that does not have a LM Head.Please use another model class (e.g. `OpenAIGPTLMHeadModel`, `XLNetLMHeadModel`, `GPT2LMHeadModel`, `CTRLLMHeadModel`, `T5WithLMHeadModel`, `TransfoXLLMHeadModel`, `XLMWithLMHeadModel`, `BartForConditionalGeneration` )

I've tried importing T5WithLMHeadModel using from transformers import T5WithLMHeadModel and encounter an ImportError: cannot import name 'T5WithLMHeadModel'. I have the most up to date version of the transformers library installed, so I'm not sure if there's something wrong with my setup.

claudiatin commented 4 years ago

@riacheruvu, don't worry about the previous answer. For the sake of completeness 'bart_sum' is just the default name of the folder where the checkpoints are saved (the line export OUTPUT_DIR_NAME=bart_sum in the run_train.sh). The complete code in my notebook is the following:

%cd examples/summarization/bart

!bash run_train.sh  # run_train.sh script has been changed in order to use a custom dataset

%cd ../..
from lightning_base import BaseTransformer

%cd summarization/bart
from finetune import SummarizationTrainer

import torch
from argparse import Namespace
args = Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='../../../../dataset', do_predict=False, do_train=True, eval_batch_size=2, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=3e-05, max_grad_norm=1.0, max_source_length=1024, max_target_length=56, model_name_or_path='bart-large', n_gpu=1, n_tpu_cores=0, num_train_epochs=3, output_dir='bart_sum', seed=42, tokenizer_name='', train_batch_size=2, warmup_steps=0, weight_decay=0.0)

model = SummarizationTrainer(args)
model = model.load_from_checkpoint('bart_sum/checkpointepoch=2.ckpt')
torch.save(model.state_dict(), args.output_dir + '/pytorch_model.bin')
model.config.to_json_file(args.output_dir + '/config.json') # NOW in the bart_sum folder I have checkpoints, pytorch_model.bin and config.json

In another notebook

import torch
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import pipeline

tokenizer = BartTokenizer.from_pretrained('bart-large-cnn')
# load the fine-tuned model
model = BartForConditionalGeneration.from_pretrained('transformers/examples/summarization/bart/bart_sum')

The code works but the performances are not good. I think this is because of my dataset:)