Open manandey opened 1 month ago
Hey @manandey, could you please put the full error message here? We're only getting the final line and it's not sufficient to debug this efficiently. Thanks.
Hey @manandey, could you please put the full error message here? We're only getting the final line and it's not sufficient to debug this efficiently. Thanks.
Sure, here it is @LysandreJik .
OverflowError Traceback (most recent call last)
Cell In[15], line 1
----> 1 trainer.train()
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1780, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1778 hf_hub_utils.enable_progress_bars()
1779 else:
-> 1780 return inner_training_loop(
1781 args=args,
1782 resume_from_checkpoint=resume_from_checkpoint,
1783 trial=trial,
1784 ignore_keys_for_eval=ignore_keys_for_eval,
1785 )
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2193, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
2190 self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in_epoch
2191 self.control = self.callback_handler.on_step_end(args, self.state, self.control)
-> 2193 self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
2194 else:
2195 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2577, in Trainer._maybe_log_save_evaluate(self, tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
2575 metrics = None
2576 if self.control.should_evaluate:
-> 2577 metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
2578 self._report_to_hp_search(trial, self.state.global_step, metrics)
2580 # Run delayed LR scheduler now that metrics are populated
File /opt/conda/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:180, in Seq2SeqTrainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
178 self.gather_function = self.accelerator.gather
179 self._gen_kwargs = gen_kwargs
--> 180 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:3365, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3362 start_time = time.time()
3364 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3365 output = eval_loop(
3366 eval_dataloader,
3367 description="Evaluation",
3368 # No point gathering the predictions if there are no metrics, otherwise we defer to
3369 # self.args.prediction_loss_only
3370 prediction_loss_only=True if self.compute_metrics is None else None,
3371 ignore_keys=ignore_keys,
3372 metric_key_prefix=metric_key_prefix,
3373 )
3375 total_batch_size = self.args.eval_batch_size * self.args.world_size
3376 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:3656, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3652 metrics = self.compute_metrics(
3653 EvalPrediction(predictions=all_preds, label_ids=all_labels, inputs=all_inputs)
3654 )
3655 else:
-> 3656 metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
3657 else:
3658 metrics = {}
Cell In[13], line 9, in compute_metrics(pred)
6 labels_ids = pred.label_ids
7 pred_ids = pred.predictions
----> 9 pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
10 labels_ids[labels_ids == -100] = tokenizer.pad_token_id
11 label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3785, in PreTrainedTokenizerBase.batch_decode(self, sequences, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
3761 def batch_decode(
3762 self,
3763 sequences: Union[List[int], List[List[int]], "np.ndarray", "torch.Tensor", "tf.Tensor"],
(...)
3766 **kwargs,
3767 ) -> List[str]:
3768 """
3769 Convert a list of lists of token ids into a list of strings by calling decode.
3770
(...)
3783 `List[str]`: The list of decoded sentences.
3784 """
-> 3785 return [
3786 self.decode(
3787 seq,
3788 skip_special_tokens=skip_special_tokens,
3789 clean_up_tokenization_spaces=clean_up_tokenization_spaces,
3790 **kwargs,
3791 )
3792 for seq in sequences
3793 ]
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3786, in <listcomp>(.0)
3761 def batch_decode(
3762 self,
3763 sequences: Union[List[int], List[List[int]], "np.ndarray", "torch.Tensor", "tf.Tensor"],
(...)
3766 **kwargs,
3767 ) -> List[str]:
3768 """
3769 Convert a list of lists of token ids into a list of strings by calling decode.
3770
(...)
3783 `List[str]`: The list of decoded sentences.
3784 """
3785 return [
-> 3786 self.decode(
3787 seq,
3788 skip_special_tokens=skip_special_tokens,
3789 clean_up_tokenization_spaces=clean_up_tokenization_spaces,
3790 **kwargs,
3791 )
3792 for seq in sequences
3793 ]
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3825, in PreTrainedTokenizerBase.decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
3822 # Convert inputs to python lists
3823 token_ids = to_py_obj(token_ids)
-> 3825 return self._decode(
3826 token_ids=token_ids,
3827 skip_special_tokens=skip_special_tokens,
3828 clean_up_tokenization_spaces=clean_up_tokenization_spaces,
3829 **kwargs,
3830 )
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:625, in PreTrainedTokenizerFast._decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
623 if isinstance(token_ids, int):
624 token_ids = [token_ids]
--> 625 text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
627 clean_up_tokenization_spaces = (
628 clean_up_tokenization_spaces
629 if clean_up_tokenization_spaces is not None
630 else self.clean_up_tokenization_spaces
631 )
632 if clean_up_tokenization_spaces:
OverflowError: out of range integral type conversion attempted```
Hi @manandey
This is likely because in pred_ids
you might have negative values due to tokens masking. Can you print what is inside pred_ids
? You might need to do something close to:
pred_ids = pred.predictions
+ pred_ids[pred_ids == -100] = tokenizer.pad_token_id
pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
labels_ids[labels_ids == -100] = tokenizer.pad_token_id
Thanks @younesbelkada!
@younesbelkada I am trying to update the script to work on a Colab TPU. But it seems to be not working. Can you kindly take a look and suggest if I am doing anything wrong. Thanks! https://colab.research.google.com/drive/16UYvGbMkX5laJZVwujz-GP4EnIWBskQD?usp=sharing
@younesbelkada it would be great if you could kindly help a bit on this. Thanks!
Hi @manandey, is the same error happening as before?
Hi @amyeroberts, the script worked fine when run on a GPU. I am trying to make it work on a Colab TPU, but it's not working as expected. So, just wanted some help if you could have a high level look and suggest if I am doing something wrong or point me to some example scripts for running Huggingface Trainer on a TPU. Thanks!
Hi @manandey, looking at the notebook it seems a different problem is being encountered from the original one in this issue. In this case, a new issue should be opened, as this ensures we can properly track open and resolved bugs.
There are guides available on running on TPU with accelerate here :https://huggingface.co/docs/accelerate/en/concept_guides/training_tpu
System Info
Transformers version : 4.41.1 Python version : 3.10
Who can help?
@younesbelkada @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When I execute the code in the colab, I get the error message:
OverflowError: out of range integral type conversion attempted
This happens when I add the line:
model.generation_config.max_new_tokens = max_target_length
Otherwise I get the warning:
userwarning: using the model-agnostic default
max_length(=20) to control the generation length. we recommend setting
max_new_tokensto control the maximum length of the generation.
https://colab.research.google.com/drive/11R2MMK9nq0oe7xUXSQaW8tahhgR9cjNc?usp=sharing
cc. @younesbelkada @ArthurZucker
Expected behavior
The code should work fine.