Closed stas00 closed 1 year ago
cc @ArthurZucker Seems like the model and tokenizer have mismatched length
Yeah, but :
unk_token
so when you go above 32099
, the fast simply outputs ''
while the slow '<extra_id_-29>'
(which is a bit strange I'll give you that π
snippet:
>>> from transformers import T5Tokenizer, T5TokenizerFast
>>> tokenizer_slow = T5Tokenizer.from_pretrained("t5-base")
>>> tokenizer_slow.decode(32140) # above vocab size
'<extra_id_-3167901>'
>>> tokenizer_fast = T5TokenizerFast.from_pretrained("t5-base")
''
The issue is different. This is a integer overflow in rust:
>>> tokenizer_fast.decode(3200000000000)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Cell In[29], line 1
----> 1 tokenizer_fast.decode(3200000000000)
File ~/Work/transformers/src/transformers/tokenization_utils_base.py:3485, in PreTrainedTokenizerBase.decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, kwargs) 3482 # Convert inputs to python lists 3483 token_ids = to_py_obj(token_ids) -> 3485 return self._decode( 3486 token_ids=token_ids, 3487 skip_special_tokens=skip_special_tokens, 3488 clean_up_tokenization_spaces=clean_up_tokenization_spaces, 3489 kwargs, 3490 )
File ~/Work/transformers/src/transformers/tokenization_utils_fast.py:549, in PreTrainedTokenizerFast._decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs) 547 if isinstance(token_ids, int): 548 token_ids = [token_ids] --> 549 text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens) 551 clean_up_tokenization_spaces = ( 552 clean_up_tokenization_spaces 553 if clean_up_tokenization_spaces is not None 554 else self.clean_up_tokenization_spaces 555 ) 556 if clean_up_tokenization_spaces:
OverflowError: out of range integral type conversion attempted
That means you are juste giving a huge huge number to decode is there a reason ?
Please note I've only relayed the errors reported on the pytorch Issued by a user trying to use torch.compile
.
Hi guys,
I have the same problem with the run_seq2seq_qa.py
script and it turns out, that preds
are passed to the decode
function, with the following content:
[[ 0 250099 1013 ... -100 -100 -100]
[ 0 250099 1013 ... -100 -100 -100]
[ 0 250099 1013 ... -100 -100 -100]
...
[ 0 250099 260 ... -100 -100 -100]
[ 0 250099 442 ... -100 -100 -100]
[ 0 250099 3883 ... -100 -100 -100]]
So the problematic thing here is -100
I guess, because I can reproduce the error with:
>>> tokenizer.decode(-100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/transformers/src/transformers/tokenization_utils_base.py", line 3485, in decode
return self._decode(
File "/home/ubuntu/transformers/src/transformers/tokenization_utils_fast.py", line 549, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted
Awsome thanks for providing this! Indeed these should be converted to padding
Could it be similar to this fix? https://github.com/huggingface/transformers/pull/18592 The hardcoded -100 doesn't seem to always do the right thing.
I tried with another model arch and it's breaks too but in another way. so eval is quite broken in many ways.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=src python examples/pytorch/translation/run_translation.py --model_name_or_path 'facebook/wmt19-en-ru' --do_train --do_eval --source_lang en --target_lang de --source_prefix 'translate English to German: ' --dataset_name stas/wmt14-en-de-pre-processed --output_dir /tmp/tst-translation --num_train_epochs 1 --per_device_train_batch_size=1 --max_train_samples 10 --overwrite_output_dir --seed 1137 --per_device_eval_batch_size 1 --predict_with_generate --fp16 --max_eval_samples 10
Traceback (most recent call last):
File "examples/pytorch/translation/run_translation.py", line 664, in <module>
main()
File "examples/pytorch/translation/run_translation.py", line 605, in main
metrics = trainer.evaluate(max_length=max_length, num_beams=num_beams, metric_key_prefix="eval")
File "/mnt/nvme0/code/huggingface/transformers-master/src/transformers/trainer_seq2seq.py", line 159, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/mnt/nvme0/code/huggingface/transformers-master/src/transformers/trainer.py", line 2993, in evaluate
output = eval_loop(
File "/mnt/nvme0/code/huggingface/transformers-master/src/transformers/trainer.py", line 3174, in evaluation_loop
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "/mnt/nvme0/code/huggingface/transformers-master/src/transformers/trainer_seq2seq.py", line 290, in prediction_step
outputs = model(**inputs)
File "/home/stas/anaconda3/envs/py38-pt20/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/nvme0/code/huggingface/transformers-master/src/transformers/models/fsmt/modeling_fsmt.py", line 1251, in forward
masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.tgt_vocab_size), labels.view(-1))
File "/home/stas/anaconda3/envs/py38-pt20/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/stas/anaconda3/envs/py38-pt20/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/stas/anaconda3/envs/py38-pt20/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (56) to match target batch_size (48).
@stas00 I am facing the same issue while fine-tuning t5-small using examples/pytorch/summarization/run_summarization.py
And I can see preds
has -100
and so decode fails with the below error:
Traceback (most recent call last):
File "examples/pytorch/summarization/run_summarization.py", line 751, in <module> main()
File "examples/pytorch/summarization/run_summarization.py", line 705, in main
predict_results = trainer.predict(predict_dataset, metric_key_prefix="predict")
File "src/transformers/trainer_seq2seq.py", line 216, in predict
return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "src/transformers/trainer.py", line 3069, in predict
output = eval_loop(
File "src/transformers/trainer.py", line 3281, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "examples/pytorch/summarization/run_summarization.py", line 635, in compute_metrics
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
File "src/transformers/tokenization_utils_base.py", line 3446, in batch_decode
return [
File "src//transformers/tokenization_utils_base.py", line 3447, in <listcomp>
self.decode(
File "src/transformers/tokenization_utils_base.py", line 3486, in decode
return self._decode(
File "src/transformers/tokenization_utils_fast.py", line 549, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted
The first issue is addressed in #22693
The second issue with FSMT is due to this line added by @gante . The decoder_input_ids
not passed to generate
result in generations that have the same length as the inputs and not the targets.
@sgugger thanks for the fix. I can see the same issue in line 718 https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/run_summarization.py#L718
possible fix: preds= np.where(predict_results.predictions != -100, predict_results.predictions, tokenizer.pad_token_id) predictions = tokenizer.batch_decode(preds, skip_special_tokens=True, clean_up_tokenization_spaces=True)
Good catch, adding this too in the PR.
Thinking more, I think this is also a result of the recent changes in generate, which used to be the one padding the result with tokenizer.pad_token_id
, and it's now the Trainer
padding them with -100. cc @gante
Hey everyone -- the last issues should be gone with #22772, but feel free to comment/reopen if any related problem persists!
Hi! since a couple of weeks I also stumbled on this error. It was working just fine before. I am pretty sure I have transformer installed from source so the PR with the fix is there as well. I am using Bart-large and the Trainer class. I first define rouge as training evaluation function:
def compute_rouge(pred):
predictions, labels = pred
#decode the predictions
decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
#decode labels
decode_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
#compute results
res = rouge.compute(predictions=decode_predictions, references=decode_labels, use_stemmer=True)
#get %
return res
And give it to the trainer
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_dataset['train'],
eval_dataset=tokenized_dataset['valid'],
data_collator=collator,
tokenizer=tokenizer,
compute_metrics=compute_rouge
)
Then the script breaks in Trainer.train, while decoding for dev set evaluation:
Traceback (most recent call last):
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/finetunemodel.py", line 226, in <module>
main(args)
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/finetunemodel.py", line 149, in main
trainer.train()
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 2022, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 2288, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 159, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 2994, in evaluate
output = eval_loop(
^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 3283, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/finetunemodel.py", line 103, in compute_rouge
decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3456, in batch_decode
return [
^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3457, in <listcomp>
self.decode(
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3496, in decode
return self._decode(
^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 549, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: out of range integral type conversion attempted
Interestingly enough, on a similar formatted dataset (but longer text) while using Longformer (led), I get the same error but this time at prediction time, thus the trained is completed successfully:
Traceback (most recent call last):
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/LED_4_DWIE.py", line 236, in <module>
main(args)
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/LED_4_DWIE.py", line 161, in main
preds, labels, metrics = trainer.predict(tokenized_dataset['test'], num_beams=5, min_length=50, max_length=max_target, no_repeat_ngram_size=2, early_stopping=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 216, in predict
return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 3070, in predict
output = eval_loop(
^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/trainer.py", line 3283, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/kg2Narrative/KGNarrative2/script4trainingLLM/LED_4_DWIE.py", line 103, in compute_rouge
decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3456, in batch_decode
return [
^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3457, in <listcomp>
self.decode(
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3496, in decode
return self._decode(
^^^^^^^^^^^^^
File "/home/ghoogerw/.conda/envs/kg2Narrative/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 549, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: out of range integral type conversion attempted
Hey @GabHoo -- could you share with us a short stand-alone script to reproduce the issue? :)
Thank you for time. Here is a standaone version of the script. I hope it is the case,
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM,DataCollatorForSeq2Seq,Seq2SeqTrainingArguments,Seq2SeqTrainer
import os
from datasets import load_dataset
import numpy as np
from utils import *
import torch
import evaluate
import sys
import json
import time
import argparse
def tokenize_for_evaluation(tokenizer,preds,labels):
predicted_text = []
golden_labels = []
for pred, label in zip(preds, labels):
gen = tokenizer.decode(pred, skip_special_tokens=True)
gen = str(gen)
predicted_text.append(gen)
gold = tokenizer.decode(label, skip_special_tokens=True)
gold = str(gold)
golden_labels.append(gold)
return predicted_text,golden_labels
def process_data_BART(data_to_process,tokenizer,max_input,max_target,typeKG ):
#get the dialogue text
inputs = [graph for graph in data_to_process[f'{typeKG}']]
#tokenize text
model_inputs = tokenizer(inputs, max_length=max_input, padding='max_length', truncation=True)
#tokenize labels
#with tokenizer.as_target_tokenizer():
targets = [target for target in data_to_process['story']]
model_targets = tokenizer(targets, max_length=max_target, padding='max_length', truncation=True)
#reuturns input_ids, attention_masks, labels
data_to_process["input_ids"] = model_inputs.input_ids
data_to_process["attention_mask"] = model_inputs.attention_mask
data_to_process["labels"] = model_targets.input_ids
return data_to_process
datapath ='/daatapath
dataprefix ='pop'
typeKG = 'Instances_KG'
model_checkpoint="facebook/bart-base"
experiment_name = 'exp'
learning_rate =1e-4
batch_size = 1
epochs =3
save_model = False
max_target = 512
max_input = 512
train_file = datapath +'/' + dataprefix + '_train' + '.json'
dev_file = datapath +'/'+ dataprefix + '_dev' + '.json'
test_file = datapath +'/' + dataprefix + '_test'+ '.json'
print("Loading dataset from ",datapath)
dataset = load_dataset('json', data_files={'train': train_file, 'valid': dev_file, 'test': test_file})
todrop=list(set(dataset['test'].column_names)-set([typeKG,'story'])) #This line returns a list of all the columns to drop (all columns minus the ones we need (input typeKG and story))
print("Loading tokenizer")
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint,add_eos_token=True)
print("\nProcessing Dataset")
#the processing of the data is done batches for make it faster,number of processes 4
tokenized_dataset = dataset.map(lambda example: process_data_BART(example, tokenizer,max_input,max_target,typeKG), batched=True, num_proc=4,remove_columns=todrop)
print("\nLoading MODEL")
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
#model.to(device)
print("Collator for batches")
collator = DataCollatorForSeq2Seq(tokenizer, model=model) #this is necessary for diving in batch for training
print('Loading rouge')
rouge = evaluate.load('rouge')
def compute_rouge(pred):
predictions, labels = pred
#decode the predictions
decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
#decode labels
decode_labels = tokenizer.batch_decode(labels, skip_special_tokens=True,clean_up_tokenization_spaces=True)
#compute results
res = rouge.compute(predictions=decode_predictions, references=decode_labels, use_stemmer=True)
#get %
return res
print("\nPREPARING FOR TRAINING...")
#defining training arogouments
args = Seq2SeqTrainingArguments(
experiment_name,
evaluation_strategy='epoch',
learning_rate=learning_rate,
per_device_train_batch_size= batch_size,
per_device_eval_batch_size= batch_size,
gradient_accumulation_steps=3, #compute gradient on n examples KG story
weight_decay=0.01, #regularization
save_total_limit=1, #this is the max amount of checkpoint saved, after which previous checpoints are removed
num_train_epochs=epochs, #number of epochs
predict_with_generate=True,
generation_max_length = 512, #max number of tokens per generation
generation_num_beams=5, #decoding strategy! greedy search, beam search
eval_accumulation_steps=1, #backprop
fp16=True, #memory management
disable_tqdm=True)
#only CUDA available -> fp16=True
### almost training time
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_dataset['train'],
eval_dataset=tokenized_dataset['valid'],
data_collator=collator,
tokenizer=tokenizer,
compute_metrics=compute_rouge
)
trainer.train()
if save_model:
print("Saving model")
trainer.save_model(experiment_name+"/saved_model")
print("\nPREDICTING..")
preds, labels, metrics = trainer.predict(tokenized_dataset['test'], num_beams=5, min_length=50, max_length=512, no_repeat_ngram_size=2, early_stopping=True)
predicted_text,golden_labels=tokenize_for_evaluation(tokenizer,preds,labels)
#here is already past the error
print("\nRESULT SCORES:")
scores = metrics.items()
print(f'Results: {scores}')
The data looks the following, to substitute folde in data/path
{
"story": "Baymax is a character from the film Big Hero 6 starring Scott Adsit. He was created by Steven T Seagle and the American, Duncan Rouleau.",
"Types_KG": "[CORE] Baymax is a character from the film Big Hero 6 [TRIPLES] Duncan Rouleau - nationality - Americans | Baymax - creators - Duncan Rouleau | Baymax - creator - Steven T. Seagle | Baymax - series - Big Hero 6 (film) | Big Hero 6 (film) - starring - Scott Adsit | Baymax - creator - Duncan Rouleau | Duncan Rouleau - nationality - Americans | Baymax - creators - Steven T. Seagle | Baymax - series - Big Hero 6 (film) | Big Hero 6 (film) - starring - Scott Adsit | Scott Adsit - type - person | Americans - type - ethnic group | Steven T. Seagle - type - person | Duncan Rouleau - type - person | Big Hero 6 (film) - type - person",
"Instances_KG": "[CORE] Baymax is a character from the film Big Hero 6 [TRIPLES] Duncan Rouleau - nationality - Americans | Baymax - creators - Duncan Rouleau | Baymax - creator - Steven T. Seagle | Baymax - series - Big Hero 6 (film) | Big Hero 6 (film) - starring - Scott Adsit | Baymax - creator - Duncan Rouleau | Duncan Rouleau - nationality - Americans | Baymax - creators - Steven T. Seagle | Baymax - series - Big Hero 6 (film) | Big Hero 6 (film) - starring - Scott Adsit",
"
@GabHoo I'm afraid you'll have you will have to share complete data example or another script, the current instructions fail at data loading time if I create a file as specified. (ArrowInvalid: JSON parse error: Missing a name for object member. in row 0
)
@GabHoo Hello, I had same problem and I think problem in DataCollatorForSeq2Seq,
more specifically in label_pad_token_id.
Collator using label_pad_token_id = -100, but your tokenizer using a different (tokenizer.pad_token_id = 1).
Can you try?
collator = DataCollatorForSeq2Seq(tokenizer, model=model, label_pad_token_id=tokenizer.pad_token_id)
Hey @gante, I think behavior of DataCollatorForSeq2Seq is really unexpected. Why it requires label_pad_token_id, if it can use tokenizer.pad_token_id as with padding_side?
Hey @Pavloveuge -- the label padding triggers a different behavior at train time (if my memory does not fail me, the loss is ignored for that token)
Oh, yeah, you right, but this behavior still results in an error. And it doesn't matter which version of the tokenizer I use(Fast or not).
In case use_fast=False:
TypeError: sequence item 9: expected str instance, NoneType found
in case use_fast=True:
OverflowError: out of range integral type conversion attempted.
@Pavloveuge that sounds like a bug indeed :) Would you be able to share a short stand-alone script to reproduce the issue?
@gante Should I open new issue or reopen this?
@Pavloveuge A new issue would be preferable π
spitting off from https://github.com/huggingface/transformers/issues/22571 as it was a secondary problem reported there:
Reproduction
fails inside eval:
@sgugger