Closed SoyGema closed 1 year ago
This looks like a dataset issue, which is not in the scope of transformers
GitHub pages.
However, if you can provide a full log error + the content of train_model.py
, we might be able to have a quick look.
Hello there @ydshieh . Thanks for your time ππ You can find full script here
Full Log
07/06/2023 17:59:34 - INFO - __main__ - Training/evaluation parameters TFTrainingArguments(
_n_gpu=-1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/tmp/tst-translation/runs/Jul06_17-59-34_mbp-de-gema.lan,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=/tmp/tst-translation,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=16,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['mlflow', 'tensorboard'],
resume_from_checkpoint=None,
run_name=/tmp/tst-translation,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
07/06/2023 17:59:35 - INFO - datasets.info - Loading Dataset Infos from /Users/gema/.cache/huggingface/modules/datasets_modules/datasets/opus100/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704
07/06/2023 17:59:35 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
07/06/2023 17:59:35 - INFO - datasets.info - Loading Dataset info from /Users/gema/.cache/huggingface/datasets/opus100/en-ro/0.0.0/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704
07/06/2023 17:59:35 - WARNING - datasets.builder - Found cached dataset opus100 (/Users/gema/.cache/huggingface/datasets/opus100/en-ro/0.0.0/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704)
07/06/2023 17:59:35 - INFO - datasets.info - Loading Dataset info from /Users/gema/.cache/huggingface/datasets/opus100/en-ro/0.0.0/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 33.24it/s]
loading configuration file t5-small/config.json
Model config T5Config {
"_name_or_path": "t5-small",
"architectures": [
"T5ForConditionalGeneration"
],
"d_ff": 2048,
"d_kv": 64,
"d_model": 512,
"decoder_start_token_id": 0,
"dense_act_fn": "relu",
"dropout_rate": 0.1,
"eos_token_id": 1,
"feed_forward_proj": "relu",
"initializer_factor": 1.0,
"is_encoder_decoder": true,
"is_gated_act": false,
"layer_norm_epsilon": 1e-06,
"model_type": "t5",
"n_positions": 512,
"num_decoder_layers": 6,
"num_heads": 8,
"num_layers": 6,
"output_past": true,
"pad_token_id": 0,
"relative_attention_max_distance": 128,
"relative_attention_num_buckets": 32,
"task_specific_params": {
"summarization": {
"early_stopping": true,
"length_penalty": 2.0,
"max_length": 200,
"min_length": 30,
"no_repeat_ngram_size": 3,
"num_beams": 4,
"prefix": "summarize: "
},
"translation_en_to_de": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to German: "
},
"translation_en_to_fr": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to French: "
},
"translation_en_to_pt": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to Portuguese: "
},
"translation_en_to_ro": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to Romanian: "
}
},
"transformers_version": "4.31.0.dev0",
"use_cache": true,
"vocab_size": 32128
}
loading file spiece.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
07/06/2023 17:59:36 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /Users/gema/.cache/huggingface/datasets/opus100/en-ro/0.0.0/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704/cache-107d5d31727344a2.arrow
Running tokenizer on validation dataset: 0%| | 0/2000 [00:00<?, ? examples/s]07/06/2023 17:59:36 - INFO - datasets.arrow_dataset - Caching processed dataset at /Users/gema/.cache/huggingface/datasets/opus100/en-ro/0.0.0/256f3196b69901fb0c79810ef468e2c4ed84fbd563719920b1ff1fdc750f7704/cache-e8cb6f4c7ff7ad3e.arrow
Tensorflow: setting up strategy
loading weights file t5-small/model.safetensors
Generate config GenerationConfig {
"_from_model_config": true,
"decoder_start_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 0,
"transformers_version": "4.31.0.dev0"
}
Loaded 60,506,624 parameters in the TF 2.0 model.
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.
All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss. You can also specify `loss='auto'` to get the internal loss without printing this info string.
07/06/2023 17:59:38 - INFO - __main__ - ***** Running training *****
07/06/2023 17:59:38 - INFO - __main__ - Num examples = 1000000
07/06/2023 17:59:38 - INFO - __main__ - Num Epochs = 3.0
07/06/2023 17:59:38 - INFO - __main__ - Instantaneous batch size per device = 16
07/06/2023 17:59:38 - INFO - __main__ - Total train batch size = 16
07/06/2023 17:59:38 - INFO - __main__ - Total optimization steps = 187500
2023-07-06 17:59:38.328410: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-07-06 17:59:38.353957: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
Epoch 1/3
18/62500 [..............................] - ETA: 21:26:35 - loss: 2.2246Traceback (most recent call last):
File "/Users/gema/Documents/The-Lord-of-The-Words-The-two-frameworks/src/models/train_model.py", line 730, in <module>
main()
File "/Users/gema/Documents/The-Lord-of-The-Words-The-two-frameworks/src/models/train_model.py", line 683, in main
history = model.fit(tf_train_dataset, epochs=int(training_args.num_train_epochs), callbacks=callbacks)
File "/Users/gema/miniforge3/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/gema/miniforge3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Shape of tensor args_0 [16,128] is not compatible with expected shape [16,64].
[[{{node EnsureShape_1}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNext]] [Op:__inference_train_function_17297]
For the future, I will go with the tailored example for the forum and maybe shall be redirected there. Let me know if at some point this is a suitable issue for datasets in this case. π§πΊοΈ Thanks for the time dedicated to this, I really appreciate it, and my apologies for the inconvenience.
@Rocketknight1
Do you know why
if "cols_to_retain" in list(inspect.signature(dataset._get_output_signature).parameters.keys()):
output_signature, _ = dataset._get_output_signature(
dataset,
batch_size=None,
collate_fn=collate_fn,
collate_fn_args=collate_fn_args,
cols_to_retain=model_inputs,
)
gives output_signature
{'input_ids': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'labels': TensorSpec(shape=(None, 64), dtype=tf.int64, name=None), 'decoder_input_ids': TensorSpec(shape=(None, 64), dtype=tf.int64, name=None)}
which has a fixed sequence length 64
in labels
and decoder_input_ids
?
FYI: the sequences in dataset
have different lengths in each element.
@ydshieh We actually generate those shapes empirically by grabbing several batches from the dataset, which is not ideal but usually works. Do almost all samples from the dataset have a post-padding decoder_input_ids length of 64, but some don't? That might trigger this issue. If that turns out to be the case, let me know - I've been wary of that code for a while, so this might be a good time to try a fix!
Hello there. Thanks again for keeping this issue open. π Managed to solved the issue . Im putting it here before closing. Hopefully this can give some light to the question posted.
What I understand is that the preprocess_function
, We call the tokenizer, that is having the padding and the max length associated
1.a ) Initially what I did is set max_source_length
that fixes the length after tokenization to 64 . According to the docstring, larger sequences are truncated, and shorter are padded. IT TRAINS CORRECTLY . But then I thought that this could (please correct me if I'm wrong ) split the sequences when they are longer, therefore larger sentences could be cut, affecting to understanding context in translation in larger sentences.
2.b ) Then I discovered pad_to_max_length
. What Im assuming here is that it pads taking into account the max sequence length, so I tried to set it to True
and max_target_length
to None
. IT SEEMS TO BE TRAINING CORRECTLY as well. What Im understanding here is that Im padding WRT the max length.
Come what may, I gather to TRAIN the model with these two options. If anyone wants to keep this conversation or clarify some wrong hypothesis I might have, please come by #2 π as I wonΒ΄t consider proper to keep this issue here. ππ€
Thanks @ydshieh & @Rocketknight1
System Info
transformers ==4.31.0.dev0
tensorflow-macos==2.10.0
Hello there! π Thanks for creating examples for the Translation task!
Context
Im going through run_translation.py example modified with opus100 dataset. Launching the script with flags listed below.
Error
All dataset feature engineering seems to display well, It starts training but at some point, there is a tensor mismatch error in training.
Any hints on how Shall I reshape this? At some point, I thought it was something with preprocessing, but it starts training, so a little bit confused... I also explored wtm16 (example tested and working) during #24579 and when I go 2 the Hub, it seems to have the same structure and partitions as opus100.
Thanks for the time dedicated to this!π and for the help! Looking forward to get all this working, and share it in PyCon Spain keynote this year!
Who can help?
@gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Training is not interrupted.