Closed gisyrus closed 1 year ago
The UFuncTypeError is generally specifying the data type mismatch, from the input. The index used to query the prompt might be incorrect, or None in value, which caused the embedding index check recursively getting None value. One possible problem of this error occur is the incorrect setting of the data loaded to the document. You better check the ElasticSearchReader is loading the correct set of data with correct parameter. My code works fine as following:
# Define the query criteria, it returns all items in the example data
query_dict = {'query': {'match_all': {}}}
# Load the ElasticSearchReader data to the document list
documents = reader.load_data(
field="id", query=query_dict
)
# Prepare the index for query the prompt, and can be exported for reuse
index = GPTSimpleVectorIndex(documents, chunk_size_limit=500)
# Query the prompt
prompt = 'What is the item id which first seen is on 2022-04-12 18:31:56 UTC'
response = index.query(prompt, use_async=True, mode="embedding")
print(response)
You should see the document object with embedding=None
Document(text='518944', doc_id='68e6a0dd-d913-42fa-bf1e-01d728903429', embedding=None, doc_hash='8c31a906be5a1910c2c76c09e09145fcf34184470d571f9837ccb8a75cb08052', extra_info={'id': '518944', 'ioc': '111.167.1.44:46171', 'threat_type': 'botnet_cc', 'threat_type_desc': 'Indicator that identifies a botnet command&control server (C&C)', 'ioc_type': 'url', 'ioc_type_desc': 'URL that is used for botnet Command&control (C&C)', 'malware': 'elf.mozi', 'malware_printable': 'Mozi', 'malware_alias': None, 'malware_malpedia': 'https://malpedia.caad.fkie.fraunhofer.de/details/elf.mozi', 'confidence_level': 100, 'first_seen': '2022-04-12 18:31:56 UTC', 'last_seen': None, 'reference': None, 'reporter': 'fish_illuminati', 'tags': ['elf', 'Mozi']})
And you should see a large number of token used for embedding. Total embedding token usage: 29133 tokens
2023-04-12 17:09:18,474 P17208T20656 INFO <llama_index.token_counter.token_counter:token_counter.py/wrapper_logic L60> | > [build_index_from_nodes] Total LLM token usage: 0 tokens
2023-04-12 17:09:18,475 P17208T20656 INFO <llama_index.token_counter.token_counter:token_counter.py/wrapper_logic L63> | > [build_index_from_nodes] Total embedding token usage: 29133 tokens
These indicates you have successfully activated OpenAI Embedding API and generated the embedding index for your data, which is essential for query. The result of my prompt is like this:
What is the item id which first seen is on 2022-04-12 18:31:56 UTC
2023-04-12 17:22:35,995 P17208T20656 INFO <llama_index.token_counter.token_counter:token_counter.py/wrapper_logic L60> | > [query] Total LLM token usage: 251 tokens
2023-04-12 17:22:35,996 P17208T20656 INFO <llama_index.token_counter.token_counter:token_counter.py/wrapper_logic L63> | > [query] Total embedding token usage: 21 tokens
The item id is 518945.
Hi, @gisyrus. I'm helping the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.
Based on the information provided, it seems that you encountered a UFuncTypeError
when querying an index using LlamaIndex version 0.5.3. User bobbyng626 suggested that the error may be caused by an incorrect setting of the data loaded to the document. They provided an example code snippet that worked for them and suggested checking the ElasticSearchReader for correct data loading.
Before we close this issue, we wanted to check with you if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LlamaIndex project.
Hi,
I am using llama-index version 0.5.3 When I tried to query an index which used ElasticSearchReader to generate the documents, it gives out the following error:
I used the following json as the raw data: example.json.zip
And the generated index: example_model.json.zip
I used
GPTSimpleVectorIndex
to generate the index.