Open j0st opened 2 years ago
Hi @j0st. I came across this warning while training t5-base
and t5-large
. I'm not sure what variant of mt5 are you using, But could you try using mt5-small
? I think this is an issue with larger models probably. Also, I found a thread that had similar warming which could help you https://github.com/ThilinaRajapakse/simpletransformers/issues/983
Thanks for the answer. As you can see, I am already using the small mt5 variant.
model = T5Model("mt5","google/mt5-small", args=args)
The solution suggested in your link is setting fp16=False
but I think this is already specified in your model args (although there is fp16 written with an underscore)
args = {
"reprocess_input_data": True,
"overwrite_output_dir": True,
"max_seq_length": 256,
"num_train_epochs": 4,
"num_beams": None,
"do_sample": True,
"top_k": 50,
"top_p": 0.95,
"use_multiprocessing": False,
"save_steps": -1,
"save_eval_checkpoints": True,
"evaluate_during_training": False,
'adam_epsilon': 1e-08,
'eval_batch_size': 6,
'fp_16': False,
'gradient_accumulation_steps': 16,
'learning_rate': 0.0003,
'max_grad_norm': 1.0,
'n_gpu': 1,
'seed': 42,
'train_batch_size': 6,
'warmup_steps': 0,
'weight_decay': 0.0
}
Did the warning break your t5-base
and t5-large
models or did they still work?
No they didn't break the training tho. I could use the models
I was able to fix it by changing 'fp_16': False
to 'fp16': False
. Seems like a typo in your example notebook.
Another thing I didn't quite understand is the dataset. In your example (and in the german part of tapaco) there are text and paraphrase pairs which are not paraphrases. For example the second row in your notebook:
dataset_df.head()
Text | Paraphrase | |
---|---|---|
I ate the cheese. | I eat cheese. | |
I'm eating a yogurt. | I'm eating cheese. | |
I'm having some cheese. | I eat some cheese. | |
It's Monday. | It is Monday today. | |
It's Monday today. | Today is Monday. |
Hey, nice work! I tried to run your example script with the german tapaco dataset and the mt5 model instead of t5. When training, I get these warnings:
As you can see, the training did finish and the model was saved. But when i try to generate paraphrases, I get these weird outputs
I trained the model only for one epoch instead of four. Is this the reason for this or these warnings while training? Another thing I didn't quite understand is the dataset. In your example (and in the german part of tapaco) there are text and paraphrase pairs which are not paraphrases. For example the second your in your example notebook: