Closed Aekansh-Ak closed 3 months ago
The ConnectionTimeout
error you're encountering is likely due to one of several factors, including network or configuration issues with your OpenSearch server, the overhead of bulk ingestion operations, server performance and configuration, or the client's timeout settings being too low for the operation. Here's how you can address these issues:
Verify Server Endpoint and Accessibility: Ensure the OpenSearch server's endpoint is correct and that the server is accessible. Incorrect endpoint configuration or server unavailability can lead to timeout errors.
Check Network Connectivity: Ensure there's stable network connectivity between your client and the OpenSearch server.
Review Server Performance: The OpenSearch server's performance and configuration should be adequate to handle the workload. If the server is under heavy load, consider optimizing its resources.
Adjust Bulk Operation Settings: If you're ingesting large amounts of data, consider adjusting the max_chunk_bytes
parameter to better suit your data size and reduce the load on the server.
Modify Client Timeout Settings: The timeout settings of the AsyncOpenSearch
client may need to be increased to accommodate operations that take longer to complete, especially for large or complex ingestion tasks.
For more detailed troubleshooting, refer to the LlamaIndex repository, which might provide additional insights into the specific methods and configurations relevant to your issue.
If these steps do not resolve the issue, providing more details about your OpenSearch server configuration, the size of the data you're trying to ingest, and any specific network configurations could help in identifying a more precise solution.
To continue the conversation, mention @dosu.
Question Validation
Question
I am trying to add nodes in vectorsearch vector database, I am following this-:
https://docs.llamaindex.ai/en/stable/examples/low_level/ingestion/
endpoint = f"https://admin:admin@0.0.0.0:9200" idx = "sample-index" text_field = "text" embedding_field = "vector_field" client = OpensearchVectorClient( endpoint, idx, dim=384, embedding_field=embedding_field, text_field=text_field, use_ssl = False, verify_certs = False )
vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader("/home/seceon/opensearch_lm_index/textdocs").load_data()
embeddings = HuggingFaceEmbedding()
splitter = SentenceSplitter( chunk_size=700, chunk_overlap=300, ) nodes = splitter.get_nodes_from_documents(documents)
llm = HuggingFaceLLM( model_name=model_id, tokenizer_name=model_id, context_window=3900, max_new_tokens=256, model_kwargs={"torch_dtype": torch.float16},
tokenizer_kwargs={},
)
Settings.llm = llm Settings.embed_model = embeddings Settings.context_window=4096, Settings.num_output=1000, Settings.chunk_overlap_ratio=0.3, Settings.chunk_size_limit=None
for node in nodes: node_embedding = embed_model.get_text_embedding( node.get_content(metadata_mode="all") ) node.embedding = node_embedding vector_store.add(nodes)
####################################################### Traceback (most recent call last): File "/home/seceon/opensearch_lm_index/koshish.py", line 90, in
vector_store.add(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 476, in add
return asyncio.get_event_loop().run_until_complete(
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 492, in async_add
await self._client.index_results(nodes)
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 347, in index_results
return await _bulk_ingest_embeddings(
File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/opensearch/base.py", line 110, in _bulk_ingest_embeddings
await client.indices.refresh(index=index_name)
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/client/indices.py", line 92, in refresh
return await self.transport.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 428, in perform_request
raise e
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/transport.py", line 390, in perform_request
status, headers_response, data = await connection.perform_request(
File "/usr/local/lib/python3.10/site-packages/opensearchpy/_async/http_aiohttp.py", line 329, in perform_request
raise ConnectionTimeout("TIMEOUT", str(e), e)
opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError()