mT5 TensorFlow error - Attempt to convert a value (None) with an unsupported type

gcervantes8 commented 3 years ago

Environment info

transformers version: 4.11.2
Platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.6.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@patrickvonplaten

Information

Model I am using is mT5:

The problem arises when running the _transformers/examples/tensorflow/translation/runtranslation.py file.

I made this modification to get the machine translation to run:

    source_lang = data_args.source_lang.split("_")[0]
    target_lang = data_args.target_lang.split("_")[0]

Modified to:

    source_lang = data_args.source_lang
    target_lang = data_args.target_lang

And then I ran the script with these parameters: --do_train True --model_name_or_path google/mt5-base --tokenizer_name google/mt5-base --output_dir output --dataset_name ccaligned_multilingual --dataset_config_name sentences-ak_GH --source_lang en_XX --target_lang ak_GH

And I ran into the error: ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

Changing the model and tokenizer from google/mt5-base to t5-base will fix the error I am getting, so I think it's specific to the mT5 model.

I appreciate any help or advice, I really like this library so far!

Full error

/home/gcervantes/Desktop/work/python_envs/huggingface/bin/python /home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py --do_train True --model_name_or_path google/mt5-base --tokenizer_name google/mt5-base --output_dir output --dataset_name ccaligned_multilingual --dataset_config_name sentences-ak_GH --source_lang en_XX --target_lang ak_GH
2021-09-30 17:00:16.749595: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-09-30 17:00:16.749613: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
09/30/2021 17:00:17 - INFO - __main__ - Training/evaluation parameters TFTrainingArguments(
_n_gpu=-1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=None,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=output/runs/Sep30_17-00-17_nb24862,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
output_dir=output,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=None,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=output,
save_on_each_node=False,
save_steps=500,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
)
09/30/2021 17:00:18 - INFO - datasets.load - Found main folder for dataset https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/ccaligned_multilingual/ccaligned_multilingual.py at /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual
09/30/2021 17:00:18 - INFO - datasets.load - Found specific version folder for dataset https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/ccaligned_multilingual/ccaligned_multilingual.py at /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36
09/30/2021 17:00:18 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/ccaligned_multilingual/ccaligned_multilingual.py to /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36/ccaligned_multilingual.py
09/30/2021 17:00:18 - INFO - datasets.load - Found dataset infos file from https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/ccaligned_multilingual/dataset_infos.json to /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36/dataset_infos.json
09/30/2021 17:00:18 - INFO - datasets.load - Found metadata file for dataset https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/ccaligned_multilingual/ccaligned_multilingual.py at /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36/ccaligned_multilingual.json
09/30/2021 17:00:18 - INFO - datasets.info - Loading Dataset Infos from /home/gcervantes/.cache/huggingface/modules/datasets_modules/datasets/ccaligned_multilingual/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36
09/30/2021 17:00:18 - INFO - datasets.builder - Overwrite dataset info from restored data version.
09/30/2021 17:00:18 - INFO - datasets.info - Loading Dataset info from /home/gcervantes/.cache/huggingface/datasets/ccaligned_multilingual/sentences-ak_GH/1.0.0/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36
09/30/2021 17:00:18 - WARNING - datasets.builder - Reusing dataset ccaligned_multilingual (/home/gcervantes/.cache/huggingface/datasets/ccaligned_multilingual/sentences-ak_GH/1.0.0/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36)
09/30/2021 17:00:18 - INFO - datasets.info - Loading Dataset info from /home/gcervantes/.cache/huggingface/datasets/ccaligned_multilingual/sentences-ak_GH/1.0.0/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36
100%|██████████| 1/1 [00:00<00:00, 899.29it/s]
loading configuration file https://huggingface.co/google/mt5-base/resolve/main/config.json from cache at /home/gcervantes/.cache/huggingface/transformers/5ebfd830555547194403d6803baa127970de59b443c04b7a1a60b16a97ed3958.b589da7dac64196f9764abaf2c4c7e507cec8b14b96da3ef270d924f155062de
Model config MT5Config {
  "_name_or_path": "/home/patrick/hugging_face/t5/mt5-base",
  "architectures": [
    "MT5ForConditionalGeneration"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "mt5",
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "T5Tokenizer",
  "transformers_version": "4.11.0.dev0",
  "use_cache": true,
  "vocab_size": 250112
}

loading configuration file https://huggingface.co/google/mt5-base/resolve/main/config.json from cache at /home/gcervantes/.cache/huggingface/transformers/5ebfd830555547194403d6803baa127970de59b443c04b7a1a60b16a97ed3958.b589da7dac64196f9764abaf2c4c7e507cec8b14b96da3ef270d924f155062de
Model config MT5Config {
  "_name_or_path": "/home/patrick/hugging_face/t5/mt5-base",
  "architectures": [
    "MT5ForConditionalGeneration"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "mt5",
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "T5Tokenizer",
  "transformers_version": "4.11.0.dev0",
  "use_cache": true,
  "vocab_size": 250112
}

loading file https://huggingface.co/google/mt5-base/resolve/main/spiece.model from cache at /home/gcervantes/.cache/huggingface/transformers/4764ec347af4d2d6286acbe1d9d630ac0afd8554a4c4a64170e0b663fd2e2412.84ea7af2df68dc8db434d3160aab65cce8ac63ce5b6f7743f8c9a4a14b4f77e2
loading file https://huggingface.co/google/mt5-base/resolve/main/tokenizer.json from cache at None
loading file https://huggingface.co/google/mt5-base/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/google/mt5-base/resolve/main/special_tokens_map.json from cache at /home/gcervantes/.cache/huggingface/transformers/0d7d5b3fc19bf58d4b274990c8bcf5e307726bc18d95f40a1436dfb6a0892f85.294ebaa4cd17bb284635004c92d2c4d522ec488c828dcce0c2471b6f28e3fe82
loading file https://huggingface.co/google/mt5-base/resolve/main/tokenizer_config.json from cache at /home/gcervantes/.cache/huggingface/transformers/afba33be693521ccefbde6d03b93b5c517d7108ba31f6c08000ed52c2cea45c9.28bbf90ae7962b1b7211c0ce8b2006f968c82439ec9c47e0847ba63642f9435a
loading configuration file https://huggingface.co/google/mt5-base/resolve/main/config.json from cache at /home/gcervantes/.cache/huggingface/transformers/5ebfd830555547194403d6803baa127970de59b443c04b7a1a60b16a97ed3958.b589da7dac64196f9764abaf2c4c7e507cec8b14b96da3ef270d924f155062de
Model config MT5Config {
  "_name_or_path": "/home/patrick/hugging_face/t5/mt5-base",
  "architectures": [
    "MT5ForConditionalGeneration"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "mt5",
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "T5Tokenizer",
  "transformers_version": "4.11.0.dev0",
  "use_cache": true,
  "vocab_size": 250112
}

loading configuration file https://huggingface.co/google/mt5-base/resolve/main/config.json from cache at /home/gcervantes/.cache/huggingface/transformers/5ebfd830555547194403d6803baa127970de59b443c04b7a1a60b16a97ed3958.b589da7dac64196f9764abaf2c4c7e507cec8b14b96da3ef270d924f155062de
Model config MT5Config {
  "_name_or_path": "/home/patrick/hugging_face/t5/mt5-base",
  "architectures": [
    "MT5ForConditionalGeneration"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "mt5",
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "T5Tokenizer",
  "transformers_version": "4.11.0.dev0",
  "use_cache": true,
  "vocab_size": 250112
}

09/30/2021 17:00:22 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/gcervantes/.cache/huggingface/datasets/ccaligned_multilingual/sentences-ak_GH/1.0.0/ecebf2fba25342d63934850b389502a24fb3d61845e74643a416e06c773ffa36/cache-d7a5cf279d2e727e.arrow
Tensorflow: setting up strategy
2021-09-30 17:00:22.340094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 17:00:22.340493: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340535: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340572: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340609: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340646: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340682: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340718: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340754: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-09-30 17:00:22.340762: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-30 17:00:22.341064: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
loading weights file https://huggingface.co/google/mt5-base/resolve/main/tf_model.h5 from cache at /home/gcervantes/.cache/huggingface/transformers/41c2fc682e5acee0c74105c9950da8f133eef8879ef0e2e2edd37c4d237da2ee.ffac6e54739b6e6cd3d9e8b6671a9514d3b1b755459a51fdc1749d110e5a5a1d.h5
2021-09-30 17:00:22.636446: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
All model checkpoint layers were used when initializing TFMT5ForConditionalGeneration.

All the layers of TFMT5ForConditionalGeneration were initialized from the model checkpoint at google/mt5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMT5ForConditionalGeneration for predictions without further training.
Traceback (most recent call last):
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 622, in <module>
    main()
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 493, in main
    model.resize_token_embeddings(len(tokenizer))
  File "/home/gcervantes/Desktop/work/Code/transformers/src/transformers/modeling_tf_utils.py", line 856, in resize_token_embeddings
    model_embeds = self._resize_token_embeddings(new_num_tokens)
  File "/home/gcervantes/Desktop/work/Code/transformers/src/transformers/modeling_tf_utils.py", line 901, in _resize_token_embeddings
    new_lm_head_decoder = self._get_resized_lm_head_decoder(old_lm_head_decoder, new_num_tokens)
  File "/home/gcervantes/Desktop/work/Code/transformers/src/transformers/modeling_tf_utils.py", line 981, in _get_resized_lm_head_decoder
    self._get_word_embedding_weight(self.get_input_embeddings()) == old_lm_head_decoder
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 1092, in __eq__
    return gen_math_ops.equal(self, other, incompatible_shape_error=False)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 3208, in equal
    return equal_eager_fallback(
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 3237, in equal_eager_fallback
    _attr_T, _inputs_T = _execute.args_to_matching_eager([x, y], ctx, [])
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 273, in args_to_matching_eager
    tensor = ops.convert_to_tensor(
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 346, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 283, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 308, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

Process finished with exit code 1

patrickvonplaten commented 3 years ago

Hey @gcervantes8,

I don't really understand why this change is needed:

    source_lang = data_args.source_lang.split("_")[0]
    target_lang = data_args.target_lang.split("_")[0]

What error do you get without making this change? Can you maybe copy-paste the command you run that gives you an error without having made the above changes

gcervantes8 commented 3 years ago

Hey @patrickvonplaten thanks for the help!

Without making the change I get this error:

Traceback (most recent call last):
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 620, in <module>
    main()
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 450, in main
    train_dataset = train_dataset.map(
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1686, in map
    return self._map_single(
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/datasets/fingerprint.py", line 398, in wrapper
    out = func(self, *args, **kwargs)
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2048, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/home/gcervantes/Desktop/work/python_envs/huggingface/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1939, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 424, in preprocess_function
    inputs = [ex[source_lang] for ex in examples["translation"]]
  File "/home/gcervantes/Desktop/work/Code/transformers/examples/tensorflow/translation/run_translation.py", line 424, in <listcomp>
    inputs = [ex[source_lang] for ex in examples["translation"]]
KeyError: 'en'

Process finished with exit code 1

I do this because in the ccaligned_multilingual data, the keys used in the JSON file are ak_GH and en_XX. The original code modifies the language code so that it only uses the characters before the _, in this example it would give en and ak giving me a key error.

gcervantes8 commented 3 years ago

Without making any changes to the code, running _transformers/examples/tensorflow/translation/runtranslation.py with these arguments also gives the same error.

--do_train True --model_name_or_path google/mt5-base --tokenizer_name google/mt5-base --output_dir output --dataset_name opus_euconst --dataset_config_name cs-en --source_lang cs --target_lang en

Changing the model and tokenizer to google/byt5-base gives no error.

patrickvonplaten commented 3 years ago

Hey @gcervantes8,

Thanks for the answer! @Rocketknight1 - could you maybe give this a look? I think you've recently worked with the TF translation script no? :-)

Rocketknight1 commented 3 years ago

@patrickvonplaten On my list, I'll try to investigate this today or tomorrow!

Rocketknight1 commented 3 years ago

Hi @gcervantes8, sorry to be annoying, but can I ask you to test this with the TF translation notebook too? Just swap in the mT5 model and your dataset there, and then if you encounter the same issue, you can save and upload the notebook with your changes and the error outputs. I know it's a bit lazy of me, but it'll make it much easier for me to reproduce and locate the problem!

gcervantes8 commented 3 years ago

Hey @Rocketknight1 thanks for the help! So I tried running the model with the TF translation notebook, but I didn't encounter the issue strangely enough.

These are the changes I made to the TF Notebook. I changed the model. model_checkpoint = "google/mt5-base" Changed the dataset raw_datasets = load_dataset("opus_euconst", "cs-da") I modified the source language and the target language specified before the preprocess function

source_lang = "cs"
target_lang = "da"

I modified the batch size because I was getting out of memory errors batch_size = 1

And I also had to remove the validation_dataset.

So this might be specific to the _transformers/examples/tensorflow/translation/runtranslation.py script

Rocketknight1 commented 3 years ago

Hm, that's quite unusual because the scripts should be similar. I'll try to reproduce with the example script here in the next couple of days and let you know what I find.

gcervantes8 commented 3 years ago

I looked into it more and it seems that the resize_token_embeddings function in _src/transformers/modeling_tfutils.py expects the get_output_embeddings function in _src/transformers/models/t5/modeling_tft5.py to return an object with the attribute weight or decoder.

The model works for T5 because in the get_output_embeddings T5 function, self.config.tie_word_embeddings is True and it doesn't go to the else part of the if statement which only returns the Tensor.

I'm not really sure how the best way for this to be fixed is. @patrickvonplaten what do you think?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gcervantes8 commented 3 years ago

I think this issue still needs to be addressed, I'm still receiving the error retraining a mT5 Model using TensorFlow.

gcervantes8 commented 3 years ago

It seems this issue is the same (or maybe just similar?) as issue #13839 And it looks like #14329 will probably fix it, so I'll close this issue.

huggingface / transformers