Fine-Tuning mt5 on tapaco de

j0st commented 2 years ago

Hey, nice work! I tried to run your example script with the german tapaco dataset and the mt5 model instead of t5. When training, I get these warnings:

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py:3365: FutureWarning: 
`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

  warnings.warn(formatted_warning, FutureWarning)
Using Adafactor for T5
Epoch 1 of 1: 100%
1/1 [4:37:28<00:00, 16648.54s/it]
Epochs 0/1. Running Loss: nan: 100%
9347/9347 [4:37:23<00:00, 1.38s/it]
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
(584, nan)

As you can see, the training did finish and the model was saved. But when i try to generate paraphrases, I get these weird outputs

Generating outputs: 100%
1/1 [00:00<00:00, 1.60it/s]
Decoding outputs: 100%
5/5 [00:01<00:00, 1.65s/it]
[['<extra_id_0>.',
  '<extra_id_0>.',
  '<extra_id_0>',
  '<extra_id_0>) <extra_id_36> ein.',
  '<extra_id_0> waren']]

I trained the model only for one epoch instead of four. Is this the reason for this or these warnings while training? Another thing I didn't quite understand is the dataset. In your example (and in the german part of tapaco) there are text and paraphrase pairs which are not paraphrases. For example the second your in your example notebook:

In [ ]:

dataset_df.head()

Out[ ]:

hetpandya commented 2 years ago

Hi @j0st. I came across this warning while training t5-base and t5-large. I'm not sure what variant of mt5 are you using, But could you try using mt5-small? I think this is an issue with larger models probably. Also, I found a thread that had similar warming which could help you https://github.com/ThilinaRajapakse/simpletransformers/issues/983

j0st commented 2 years ago

Thanks for the answer. As you can see, I am already using the small mt5 variant.

model = T5Model("mt5","google/mt5-small", args=args)

The solution suggested in your link is setting fp16=False but I think this is already specified in your model args (although there is fp16 written with an underscore)

args = {
    "reprocess_input_data": True,
    "overwrite_output_dir": True,
    "max_seq_length": 256,
    "num_train_epochs": 4,
    "num_beams": None,
    "do_sample": True,
    "top_k": 50,
    "top_p": 0.95,
    "use_multiprocessing": False,
    "save_steps": -1,
    "save_eval_checkpoints": True,
    "evaluate_during_training": False,
    'adam_epsilon': 1e-08,
    'eval_batch_size': 6,
    'fp_16': False,
    'gradient_accumulation_steps': 16,
    'learning_rate': 0.0003,
    'max_grad_norm': 1.0,
    'n_gpu': 1,
    'seed': 42,
    'train_batch_size': 6,
    'warmup_steps': 0,
    'weight_decay': 0.0
}

Did the warning break your t5-base and t5-large models or did they still work?

hetpandya commented 2 years ago

No they didn't break the training tho. I could use the models

j0st commented 2 years ago

I was able to fix it by changing 'fp_16': False to 'fp16': False. Seems like a typo in your example notebook.

Another thing I didn't quite understand is the dataset. In your example (and in the german part of tapaco) there are text and paraphrase pairs which are not paraphrases. For example the second row in your notebook:

In [ ]:

dataset_df.head()

Out[ ]:

hetpandya commented 2 years ago

I was able to fix it by changing 'fp_16': False to 'fp16': False. Seems like a typo in your example notebook.

Thank you for pointing out the typo! I never realised this.

Another thing I didn't quite understand is the dataset. In your example (and in the german part of tapaco) there are text and paraphrase pairs which are not paraphrases. For example the second row in your notebook:

I think this is certainly because of the logic written in the generate_tapaco_paraphrase_dataset() function in the notebook. What the logic does is take the first two sentences from the actual huggingface dataset as paraphrase pairs. The same is repeated for the next two consecutive sentences. This is okay when an even number of examples for one context are given are given, but when not given, this kind of issue occurs as in my notebook.

Text	Paraphrase
I ate the cheese.	I eat cheese.
I'm eating a yogurt.	I'm eating cheese.
I'm having some cheese.	I eat some cheese.
It's Monday.	It is Monday today.
It's Monday today.	Today is Monday.

hetpandya / paraphrase-datasets-pretrained-models

Fine-Tuning mt5 on tapaco de #1