run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.13k stars 5.15k forks source link

[Bug]: "Object of type RelatedNodeInfo is not JSON serializable" in OpenSearch Vector_Index #6609

Closed nebucaz closed 1 year ago

nebucaz commented 1 year ago

Bug Description

TypeError: Object of type RelatedNodeInfo is not JSON serializable

Exception has occurred: TypeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Object of type RelatedNodeInfo is not JSON serializable
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/vector_stores/opensearch.py", line 122, in <listcomp>
    bulk = "\n".join([json.dumps(v) for v in bulk_req]) + "\n"
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/vector_stores/opensearch.py", line 122, in index_results
    bulk = "\n".join([json.dumps(v) for v in bulk_req]) + "\n"
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/vector_stores/opensearch.py", line 226, in add
    self._client.index_results(embedding_results)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 191, in _add_nodes_to_index
    new_ids = self._vector_store.add(embedding_results)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 214, in _build_index_from_nodes
    self._add_nodes_to_index(index_struct, nodes)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 225, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/token_counter/token_counter.py", line 78, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/base.py", line 68, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/vector_store/base.py", line 45, in __init__
    super().__init__(
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/llama_index/indices/base.py", line 96, in from_documents
    return cls(
  File "main.py", line 179, in testLLamaIndex
    index = VectorStoreIndex.from_documents(documents=documents, storage_context=storage_context)
  File "main.py", line 195, in <module>
    testLLamaIndex()
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/xxx/opt/anaconda3/envs/gpt39/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
TypeError: Object of type RelatedNodeInfo is not JSON serializable

Version

0.6.34.post1

Steps to Reproduce

1) Create a directory test-docwith 2 text-files (a.txt, b.txt) each containing a single line of the paul-graham-essay 2) Start a docker-container with OpenSearch 3) Run the following Python script to create embeddings of the text in the two text files and use OpenSearch as a VectorIndexStore

    muser = os.getenv("OS_MASTER_USERNAME")
    mpass = os.getenv("OS_MASTER_PASSWORD")
    osendpoint = os.getenv("OS_ENDPOINT")
    endpoint = f"http://{muser}:{mpass}@{osendpoint}"
    idx = "gpt-index-demo"

    text_field = "content"
    embedding_field = "embedding"

    documents = SimpleDirectoryReader("examples/test-doc").load_data()
    client = OpensearchVectorClient(endpoint, idx, 1536, embedding_field=embedding_field, text_field=text_field)

    vector_store = OpensearchVectorStore(client)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(documents=documents, storage_context=storage_context)

4) The Error is thrown at the last line, ocurring in the file 'opensearch.py' on line 121.

Relevant Logs/Tracbacks

DEBUG:

DEBUG:llama_index.readers.file.base:> [SimpleDirectoryReader] Total files added: 2
> [SimpleDirectoryReader] Total files added: 2
DEBUG:httpx:load_ssl_context verify=True cert=None trust_env=True http2=False
load_ssl_context verify=True cert=None trust_env=True http2=False
DEBUG:httpx:load_verify_locations cafile='/Users/neo/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/certifi/cacert.pem'
load_verify_locations cafile='/Users/neo/opt/anaconda3/envs/gpt39/lib/python3.9/site-packages/certifi/cacert.pem'
DEBUG:httpcore.connection:connect_tcp.started host='localhost' port=9200 local_address=None timeout=5.0 socket_options=None
connect_tcp.started host='localhost' port=9200 local_address=None timeout=5.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore.backends.sync.SyncStream object at 0x15c58c550>
connect_tcp.complete return_value=<httpcore.backends.sync.SyncStream object at 0x15c58c550>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'PUT']>
send_request_headers.started request=<Request [b'PUT']>
DEBUG:httpcore.http11:send_request_headers.complete
send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'PUT']>
send_request_body.started request=<Request [b'PUT']>
DEBUG:httpcore.http11:send_request_body.complete
send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'PUT']>
receive_response_headers.started request=<Request [b'PUT']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 400, b'Bad Request', [(b'content-type', b'application/json; charset=UTF-8'), (b'content-encoding', b'gzip'), (b'content-length', b'183')])
receive_response_headers.complete return_value=(b'HTTP/1.1', 400, b'Bad Request', [(b'content-type', b'application/json; charset=UTF-8'), (b'content-encoding', b'gzip'), (b'content-length', b'183')])
INFO:httpx:HTTP Request: PUT http://admin:admin@localhost:9200/gpt-index-demo "HTTP/1.1 400 Bad Request"
HTTP Request: PUT http://admin:admin@localhost:9200/gpt-index-demo "HTTP/1.1 400 Bad Request"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'PUT']>
receive_response_body.started request=<Request [b'PUT']>
DEBUG:httpcore.http11:receive_response_body.complete
receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
response_closed.started
DEBUG:httpcore.http11:response_closed.complete
response_closed.complete
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: Before college the two main things I worked on,...
> Adding chunk: Before college the two main things I worked on,...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: With microcomputers, everything changed. Now yo...
> Adding chunk: With microcomputers, everything changed. Now yo...
DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
DEBUG:openai:api_version=None data='{"input": ["Before college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.", "With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
api_version=None data='{"input": ["Before college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.", "With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
DEBUG:urllib3.util.retry:Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.openai.com:443
Starting new HTTPS connection (1): api.openai.com:443
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/embeddings HTTP/1.1" 200 None
https://api.openai.com:443 "POST /v1/embeddings HTTP/1.1" 200 None
DEBUG:openai:message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=8702 request_id=76061dd045238dd9a1111b70abb6db10 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=8702 request_id=76061dd045238dd9a1111b70abb6db10 response_code=200
logan-markewich commented 1 year ago

@nebucaz should be an easy fix, thanks for identifying this! Will patch shortly

logan-markewich commented 1 year ago

@nebucaz if possible, could you try the branch in this PR and see if it works for you? OpenSearch is complicated to setup lol

No worries if you can't test, I'm fairly confident it should work fine

https://github.com/jerryjliu/llama_index/pull/6612