run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

MIT License

36.68k stars 5.25k forks source link

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I am trying to add nodes in vectorsearch vector database, I am following this-:

https://docs.llamaindex.ai/en/stable/examples/low_level/ingestion/

endpoint = f"https://admin:admin@0.0.0.0:9200" idx = "sample-index" text_field = "text" embedding_field = "vector_field" client = OpensearchVectorClient( endpoint, idx, dim=384, embedding_field=embedding_field, text_field=text_field, use_ssl = False, verify_certs = False )

vector_store = OpensearchVectorStore(client)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader("/home/seceon/opensearch_lm_index/textdocs").load_data()

embeddings = HuggingFaceEmbedding()

splitter = SentenceSplitter( chunk_size=700, chunk_overlap=300, ) nodes = splitter.get_nodes_from_documents(documents)

llm = HuggingFaceLLM( model_name=model_id, tokenizer_name=model_id, context_window=3900, max_new_tokens=256, model_kwargs={"torch_dtype": torch.float16},

tokenizer_kwargs={},

generate_kwargs={"temperature": 0.3,  "top_p": 0.95},
device_map="auto",

)

Settings.llm = llm Settings.embed_model = embeddings Settings.context_window=4096, Settings.num_output=1000, Settings.chunk_overlap_ratio=0.3, Settings.chunk_size_limit=None

for node in nodes: node_embedding = embed_model.get_text_embedding( node.get_content(metadata_mode="all") ) node.embedding = node_embedding vector_store.add(nodes)

####################################################### Traceback (most recent call last): File "/home/seceon/opensearch_lm_index/koshish.py", line 90, in vector_store.add(nodes) File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 476, in add return asyncio.get_event_loop().run_until_complete( File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete return future.result() File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 492, in async_add await self._client.index_results(nodes) File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 347, in index_results return await _bulk_ingest_embeddings( File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 110, in _bulk_ingest_embeddings await client.indices.refresh(index=index_name) File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/client/indices.py", line 92, in refresh return await self.transport.perform_request( File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 428, in perform_request raise e File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 390, in perform_request status, headers_response, data = await connection.perform_request( File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 329, in perform_request raise ConnectionTimeout("TIMEOUT", str(e), e) opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError()

The ConnectionTimeout error you're encountering is likely due to one of several factors, including network or configuration issues with your OpenSearch server, the overhead of bulk ingestion operations, server performance and configuration, or the client's timeout settings being too low for the operation. Here's how you can address these issues:

Verify Server Endpoint and Accessibility: Ensure the OpenSearch server's endpoint is correct and that the server is accessible. Incorrect endpoint configuration or server unavailability can lead to timeout errors.
Check Network Connectivity: Ensure there's stable network connectivity between your client and the OpenSearch server.
Review Server Performance: The OpenSearch server's performance and configuration should be adequate to handle the workload. If the server is under heavy load, consider optimizing its resources.
Adjust Bulk Operation Settings: If you're ingesting large amounts of data, consider adjusting the max_chunk_bytes parameter to better suit your data size and reduce the load on the server.
Modify Client Timeout Settings: The timeout settings of the AsyncOpenSearch client may need to be increased to accommodate operations that take longer to complete, especially for large or complex ingestion tasks.

For more detailed troubleshooting, refer to the LlamaIndex repository, which might provide additional insights into the specific methods and configurations relevant to your issue.

If these steps do not resolve the issue, providing more details about your OpenSearch server configuration, the size of the data you're trying to ingest, and any specific network configurations could help in identifying a more precise solution.

#### Details

**Sources** - [llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/llama_index/vector_stores/opensearch/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/llama_index/vector_stores/opensearch/base.py) - [llama-index-legacy/llama_index/legacy/vector_stores/opensearch.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/opensearch.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/tests/docker-compose.yml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/tests/docker-compose.yml) - [llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

run-llama / llama_index

[Question]: vector_store.add(nodes) gives me error = opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError() #13110

Question Validation

Question

storage_context = StorageContext.from_defaults(vector_store=vector_store)

tokenizer_kwargs={},