urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.36k stars 118 forks source link

shape mismatch: indexing tensors could not be broadcast together with shapes [248], [249] #130

Closed ExtReMLapin closed 3 months ago

ExtReMLapin commented 4 months ago

Hello,

I worked on extracted text from pdf and I managed to get this error. While I understand filtering text is a temporary hotfix, there is still an underlying issue that was not present in older version (don't ask me which, console limited history doesn't help )

Code to reproduce the error :


from gliner import GLiNER

model = GLiNER.from_pretrained("EmergentMethods/gliner_medium_news-v2.1")

labels = ["person"]
with open("window_text.txt", encoding="utf-8-sig") as f:
    window_text = f.read()

entities = model.predict_entities(window_text, labels)

window_text.txt


Traceback (most recent call last):
  File "/home/censored/censored2/gliner_small.py", line 10, in <module>
    entities = model.predict_entities(window_text, labels)
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/model.py", line 132, in predict_entities
    return self.batch_predict_entities(
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/model.py", line 147, in batch_predict_entities
    model_output = self.model(**model_input)[0]
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 148, in forward
    prompts_embedding, prompts_embedding_mask, words_embedding, mask = self.get_representations(input_ids, attention_mask,
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 99, in get_representations
    prompts_embedding, prompts_embedding_mask, words_embedding, mask = self._extract_prompt_features_and_word_embeddings(token_embeds, input_ids, attention_mask,
  File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 86, in _extract_prompt_features_and_word_embeddings
    words_embedding[batch_indices, target_word_idx] = token_embeds[batch_indices, word_idx]
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [248], [249]
Ingvarstep commented 4 months ago

@ExtReMLapin , thank you for pointing out this issue. It was fixed in the last commit.

urchade commented 4 months ago

I have updated the library to v0.2.6