Open erlebach opened 5 months ago
@erlebach because it relies on a rather heavy dependency, span-marker and transformers.
You can install and import it
pip install llama-index-extractors-entity
from llama_index.extractors.entity import EntityExtractor
Please explain the following error. Here is the code:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.extractors.entity import EntityExtractor
from headers import (
SimpleDirectoryReader,
Ollama,
Settings,
)
# Create an instance of Ollama with the specified parameters
llm = Ollama(model="phi3:latest", request_timeout=600.0, temperature=0.0)
Settings.llm = llm
reader = SimpleDirectoryReader('files')
documents = reader.load_data()
parser = SentenceSplitter(include_prev_next_rel=True)
nodes = parser.get_nodes_from_documents(documents)
entity_extractor = EntityExtractor(
label_entities=True,
device="cpu"
)
metadata_list = entity_extractor.extract(nodes) # ERROR
print(metadata_list)
and the error, which occurs on the line where entity_extractor.extract(nodes)
is executed:
python sample_extractor_EntityExtractor.py
/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Extracting entities: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/ch4/sample_extractor_EntityExtractor.py", line 37, in <module>
metadata_list = entity_extractor.extract(nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/llama_index/core/extractors/interface.py", line 96, in extract
return asyncio_run(self.aextract(nodes))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/llama_index/core/async_utils.py", line 31, in asyncio_run
return loop.run_until_complete(coro)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/asyncio/base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/llama_index/extractors/entity/base.py", line 136, in aextract
spans = self._model.predict(words)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/span_marker/modeling.py", line 512, in predict
output = self(**batch)
^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/span_marker/modeling.py", line 153, in forward
outputs = self.encoder(
^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/transformers/models/bert/modeling_bert.py", line 1103, in forward
extended_attention_mask = _prepare_4d_attention_mask_for_sdpa(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/erlebach/src/2024/llama_index_gordon/Building-Data-Driven-Applications-with-LlamaIndex/.venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py", line 439, in _prepare_4d_attention_mask_for_sdpa
batch_size, key_value_length = mask.shape
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)
It appears that an update to the transformers library caused this issue. The version of transformers likely differs from the one used when llama-index-extractors-entity was released, as span_marker depends on transformers>=4.19.0.
I mitigated this issue by downgrading to transformers==4.40.2, as the problem occurs starting from version 4.41.0.
this error also occured on llama-index-entity-example . I also mitigated it.
Problem still exists in transformer 4.43.3.
Still exists in transformers 4.44.2 as well. All metadata extractor samples seem to fail because of this issue.
Question Validation
Question
Why isn't the EntityExtractor implemented inside
metadata_extractors.py
rather than in its current location:llama-index-integrations/extractors/llama-index-extractors-entity/llama_index/extractors
? I am having issues importing the EntityExtractor using poetry.