TypeError: '>' not supported between instances of 'NoneType' and 'int' #27505

Closed ChristophKnapp closed 9 months ago

ChristophKnapp commented 9 months ago

System Info

2023-11-15 07:10:36.235004: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

No response




  1. Setup a python environment in pycharm.
  2. Add transformer example script for translation from englisch to romanian,
  3. Install python libraries from within pycharm.
  4. Install transformers development version as requested by script.
  5. Run script, after first epoch error is thrown.

Expected behavior

I'm running into this problem when I run the English to Romania translation example. I'm not aware that I modified anything in the script. It fits the model up to the first epoch then it throws this error. There are already two issue reports with this problem nobody felt responsible to take on. I pasted this as a comment in one of them. Given that I was not sure whether the old issue is reopened, I decided to create a new one.

I will debug this on my own but any help is appreciated.


2023-11-13 15:47:58.542480: I tensorflow/core/util/] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. 2023-11-13 15:47:58.564097: E tensorflow/compiler/xla/stream_executor/cuda/] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 11/13/2023 15:47:59 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=-1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
per_device_eval_batch_size=16,
per_device_train_batch_size=16,
) 11/13/2023 15:48:01 - INFO - datasets.builder - Found cached dataset wmt16 (/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/746749a11d25c02058042da7502d973ff410e73457f3d305fc1177dc0e8c4227)

loading file spiece.model from cache at /.cache/huggingface/hub/models--t5-small/snapshots/df1b051c49625cf57a3d0d8d3863ed4d13564fe4/spiece.model
Loading cached processed dataset at /.cache/huggingface/datasets/wmt16/ro-en/1.0.0/746749a11d25c02058042da7502d973ff410e73457f3d305fc1177dc0e8c4227/cache-164eb734af318539.arrow

Loaded 60,506,624 parameters in the TF 2.0 model. All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
11/13/2023 15:48:04 - INFO - main - Running training
11/13/2023 15:48:04 - INFO - main - Num examples = 610320
11/13/2023 15:48:04 - INFO - main - Num Epochs = 3.0
11/13/2023 15:48:04 - INFO - main - Instantaneous batch size per device = 16
11/13/2023 15:48:04 - INFO - main - Total train batch size = 16
11/13/2023 15:48:04 - INFO - main - Total optimization steps = 114435
Epoch 1/3
38145/38145 [==============================] - ETA: 0s - loss: 0.6117

Traceback (most recent call last): File "/workspace/transformer/", line 733, in main() File "/workspace/transformer/", line 693, in main history =, epochs=int(training_args.num_train_epochs), callbacks=callbacks) File "/workspace/transformer/lib/python3.10/site-packages/keras/src/utils/", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/workspace/transformer/lib/python3.10/site-packages/transformers/", line 223, in on_epoch_end predictions = self.generation_function(generation_inputs, attention_mask=attention_mask) File "/tmp/", line 13, in tfgenerationfunction retval = ag.converted_call(ag.ld(self).model.generate, (ag.ld(inputs),), dict(attention_mask=ag.ld(attention_mask), **ag.ld(self).generate_kwargs), fscope) File "/tmp/", line 437, in tfgenerate is_beam_genmode = ag.and(lambda : ag.not_(ag.ld(is_contrastive_search_gen_mode)), lambda : ag.and_(lambda : ag.ld(generation_config).num_beams > 1, lambda : ag.ld(generation_config).do_sample is False)) File "/tmp/", line 437, in is_beam_genmode = ag.and(lambda : ag_.not(ag.ld(is_contrastive_search_gen_mode)), lambda : ag.and_(lambda : ag.ld(generation_config).num_beams > 1, lambda : ag.ld(generation_config).do_sample is False)) File "/tmp/", line 437, in is_beam_genmode = ag.and(lambda : ag_.not(ag.ld(is_contrastive_search_gen_mode)), lambda : ag_.and(lambda : ag.ld(generation_config).num_beams > 1, lambda : ag.ld(generation_config).do_sample is False)) TypeError: in user code:

File "/workspace/transformer/lib/python3.10/site-packages/transformers/", line 202, in generation_function * return self.model.generate(inputs, attention_mask=attention_mask, *self.generate_kwargs) File "/workspace/transformer/lib/python3.10/site-packages/transformers/generation/", line 884, in generate is_beam_gen_mode = (

TypeError: '>' not supported between instances of 'NoneType' and 'int'

Process finished with exit code 1

ChristophKnapp commented 9 months ago

--max_train_samples 500 --max_eval_samples 500 --max_predict_samples 500

reduces waiting time for this error to appear to a minute. This works for me, further reduction does not seem to help much.

amyeroberts commented 9 months ago

cc @Rocketknight1 as it seems to be failing on a TF script

Rocketknight1 commented 9 months ago

Hi @ChristophKnapp, can you confirm that you're running the translation example from here and paste me the exact command you used to run it so I can reproduce?

ChristophKnapp commented 9 months ago

@Rocketknight1 Yes that's the script I'm using. The terminal options are:

--output_dir /workspace/results --model_name_or_path t5-small --do_train --do_eval --source_lang en --target_lang ro --source_prefix translate_English_toRomanian: --dataset_name wmt16 --dataset_config_name ro-en --per_device_train_batch_size=16 --per_device_eval_batch_size=16 --overwrite_output_dir --max_train_samples 500 --max_eval_samples 500 --max_predict_samples 500

except of the last three input values, that should be exactly as recomended on the example md file.

Rocketknight1 commented 9 months ago

Confirmed the issue here - the problem is this code:

is_beam_gen_mode = (
            not is_contrastive_search_gen_mode
            and (generation_config.num_beams > 1)
            and generation_config.do_sample is False

The problem is that in this case, generation_config does not have a num_beams attribute and so we get a value of None. I also see a couple of other issues in this example script that should be fixed.

@gante are you okay if I open a PR to replace this with something like getattr(generation_config, "num_beams", 1)?

Rocketknight1 commented 9 months ago

@ChristophKnapp thank you for the bug report! We've opened a PR to fix it at #27519. The updated example script is here - please try it and let me know if the issue is resolved!

ChristophKnapp commented 9 months ago

@ChristophKnapp thank you for the bug report! We've opened a PR to fix it at #27519. The updated example script is here - please try it and let me know if the issue is resolved!

Thanks a lot for all your help. The script finishes with no errors now.

Rocketknight1 commented 9 months ago

No probs! We try to run tests, but sometimes bug reports like these are the only way we find out about issues like this. It was probably affecting lots of people, not just you. Thanks for letting us know!

