huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.95k stars 26.28k forks source link

index out of range in self torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) #23822

Closed siddhsql closed 1 year ago

siddhsql commented 1 year ago

System Info

- `transformers` version: 4.29.2
- Platform: macOS-13.4-x86_64-i386-64bit
- Python version: 3.10.2
- Huggingface_hub version: 0.14.1
- Safetensors version: not installed
- PyTorch version (GPU?): 2.0.1 (False)
- Tensorflow version (GPU?): 2.12.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

Tasks

Reproduction

Run the code at:

from transformers import pipeline
import pandas as pd

# prepare table + question
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "how many movies does Leonardo Di Caprio have?"

# pipeline model
# Note: you must to install torch-scatter first.
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")

# result

print(tqa(table=table, query=query)['cells'][0])

Observed Behavior

Exception has occurred: IndexError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
index out of range in self
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/models/tapas/modeling_tapas.py", line 326, in forward
    embeddings += getattr(self, name)(token_type_ids[:, :, i])
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/models/tapas/modeling_tapas.py", line 965, in forward
    embedding_output = self.embeddings(
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/models/tapas/modeling_tapas.py", line 1217, in forward
    outputs = self.tapas(
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/table_question_answering.py", line 142, in batch_inference
    return self.model(**inputs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/table_question_answering.py", line 390, in _forward
    outputs = self.batch_inference(**model_inputs)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1025, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1100, in __call__
    outputs = list(final_iterator)
  File "/llm/tapas-poc/.env/lib/python3.10/site-packages/transformers/pipelines/table_question_answering.py", line 350, in __call__
    results = super().__call__(pipeline_inputs, **kwargs)
  File "/llm/tapas-poc/sample1.py", line 12, in <module>
    preds = table_qa(bkgs_df_str,queries)
  File "/usr/local/Cellar/python@3.10/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python@3.10/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
IndexError: index out of range in self

Expected behavior

there should be no error

sgugger commented 1 year ago

I cannot reproduce:

from transformers import pipeline
import pandas as pd

# prepare table + question
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "how many movies does Leonardo Di Caprio have?"

# pipeline model
# Note: you must to install torch-scatter first.
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")

# result

print(tqa(table=table, query=question)['cells'][0])

works without issue for me.

siddhsql commented 1 year ago

try with more than 64 rows

On Tue, May 30, 2023 at 7:04 AM Sylvain Gugger @.***> wrote:

I cannot reproduce:

from transformers import pipelineimport pandas as pd

prepare table + questiondata = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}table = pd.DataFrame.from_dict(data)question = "how many movies does Leonardo Di Caprio have?"

pipeline model# Note: you must to install torch-scatter first.tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")

result

print(tqa(table=table, query=question)['cells'][0])

works without issue for me.

— Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/23822#issuecomment-1568496021, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6NWEKZF4W7ZMAGNN5HKKF3XIX5AJANCNFSM6AAAAAAYRQMPSE . You are receiving this because you authored the thread.Message ID: @.***>

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

the-homeless-god commented 3 months ago

@siddhsql hi, did you found any solution?

siddhsql commented 3 months ago

no

Steinshark commented 3 months ago

Do you get the same error using a much smaller batch size? I'm having a very similar issue training a GPT2 model, and I get this when using batch sizes larger than n_positions, or the context size