background-1 | [2024-11-23 13:48:08,045: ERROR/ForkPoolWorker-5] app.tasks.build_index.build_index_for_document[22305254-69a8-4ec7-bd97-bad0ce25f604]: Failed to build vector index for document 30001: Traceback (most recent call last):
background-1 | File "/app/app/tasks/build_index.py", line 60, in build_index_for_document
background-1 | index_service.build_vector_index_for_document(index_session, db_document)
background-1 | File "/app/app/rag/build_index.py", line 72, in build_vector_index_for_document
background-1 | vector_index.insert(document, source_uri=db_document.source_uri)
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 215, in insert
background-1 | self.insert_nodes(nodes, **insert_kwargs)
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 330, in insert_nodes
background-1 | self._insert(nodes, **insert_kwargs)
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 311, in _insert
background-1 | self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 231, in _add_nodes_to_index
background-1 | nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
background-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 138, in _get_node_with_embedding
background-1 | id_to_embed_map = embed_nodes(
background-1 | ^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/indices/utils.py", line 138, in embed_nodes
background-1 | new_embeddings = embed_model.get_text_embedding_batch(
background-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py", line 265, in wrapper
background-1 | result = func(*args, **kwargs)
background-1 | ^^^^^^^^^^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/core/base/embeddings/base.py", line 335, in get_text_embedding_batch
background-1 | embeddings = self._get_text_embeddings(cur_batch)
background-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/embeddings/jinaai/base.py", line 202, in _get_text_embeddings
background-1 | return self._api.get_embeddings(
background-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
background-1 | File "/usr/local/lib/python3.11/site-packages/llama_index/embeddings/jinaai/base.py", line 48, in get_embeddings
background-1 | raise RuntimeError(resp["detail"])
background-1 | RuntimeError: Single text cannot exceed 8194 tokens. 8746 tokens given.
background-1 |
background-1 | [2024-11-23 13:48:08,185: INFO/ForkPoolWorker-5] Task app.tasks.build_index.build_index_for_document[22305254-69a8-4ec7-bd97-bad0ce25f604] succeeded in 36.22360512241721s: None
Note in docker compose, max Value is already set
EMBEDDING_DIMS=1024
# EMBEDDING_MAX_TOKENS should be equal or smaller than the embedding model's max tokens,
# it indicates the max size of document chunks.
EMBEDDING_MAX_TOKENS=8191
With new Jina Embed Model: jina-embeddings-v3
Ingest large PDF is erroring out
Note in docker compose, max Value is already set