I worked on extracted text from pdf and I managed to get this error.
While I understand filtering text is a temporary hotfix, there is still an underlying issue that was not present in older version (don't ask me which, console limited history doesn't help )
Code to reproduce the error :
from gliner import GLiNER
model = GLiNER.from_pretrained("EmergentMethods/gliner_medium_news-v2.1")
labels = ["person"]
with open("window_text.txt", encoding="utf-8-sig") as f:
window_text = f.read()
entities = model.predict_entities(window_text, labels)
Traceback (most recent call last):
File "/home/censored/censored2/gliner_small.py", line 10, in <module>
entities = model.predict_entities(window_text, labels)
File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/model.py", line 132, in predict_entities
return self.batch_predict_entities(
File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/model.py", line 147, in batch_predict_entities
model_output = self.model(**model_input)[0]
File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/censored/censored2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 148, in forward
prompts_embedding, prompts_embedding_mask, words_embedding, mask = self.get_representations(input_ids, attention_mask,
File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 99, in get_representations
prompts_embedding, prompts_embedding_mask, words_embedding, mask = self._extract_prompt_features_and_word_embeddings(token_embeds, input_ids, attention_mask,
File "/home/censored/censored2/venv/lib/python3.10/site-packages/gliner/modeling/base.py", line 86, in _extract_prompt_features_and_word_embeddings
words_embedding[batch_indices, target_word_idx] = token_embeds[batch_indices, word_idx]
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [248], [249]
Hello,
I worked on extracted text from pdf and I managed to get this error. While I understand filtering text is a temporary hotfix, there is still an underlying issue that was not present in older version (don't ask me which, console limited history doesn't help )
Code to reproduce the error :
window_text.txt