huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
153 stars 202 forks source link

ValidateSyncInputTensors tensor_data is empty #1241

Open xinsu626 opened 3 months ago

xinsu626 commented 3 months ago

System Info

Docker image: pytorch-installer-2.3.1:1.17.0-417
optimum-habana: main branch

Information

Tasks

Reproduction

When I ran the llama3.1-70b-instruct model for inference and set self.generation_config.ignore_eos to False, I encountered the following error.

[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 1287, in generate
[rank0]:     result = self._sample(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 2360, in _sample
[rank0]:     unfinished_sequences = unfinished_sequences & ~stopping_criteria(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/stopping_criteria.py", line 113, in gaudi_StoppingCriteriaList_call
[rank0]:     is_done = is_done | criteria(input_ids, scores, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/stopping_criteria.py", line 84, in gaudi_EosTokenCriteria_call
[rank0]:     is_done = torch.isin(input_ids[:, token_idx - 1], self.eos_token_id)
[rank0]: RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
[rank0]: Check $HABANA_LOGS/ for details[Rank:0] FATAL ERROR :: MODULE:PT_LAZY Error, ValidateSyncInputTensors tensor_data is empty. Tensorid:2006352 QueueStatus:ThreadPool m_tasks size: 1 irValue:id_6403519_module/model/79/hpu__input

Expected behavior

No errors during model inference.

regisss commented 1 month ago

@xinsu626 Can you share the command you executed please?