[Bug]: Failed to load persisted PropertyGraph data with ValueError

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

MIT License

35.55k stars 5.02k forks source link

os.environ["OPENAI_API_KEY"] = "****" os.environ["AZURE_OPENAI_ENDPOINT"] = "https://****.openai.azure.com" os.environ["OPENAI_API_VERSION"] = "2024-06-01" Settings.llm = AzureOpenAI( deployment_name="****", temperature=0.2, ) Settings.embed_model = AzureOpenAIEmbedding( model="text-embedding-3-small", deployment_name="****", ) index = PropertyGraphIndex.from_documents( documents, show_progress=True, ) index.storage_context.persist(persist_dir=output_folder) storage_context = StorageContext.from_defaults(persist_dir=output_folder) index = load_index_from_storage(storage_context=storage_context)

An error occurs during the loading process with `ValueError`, which describes that the node type could not be inferred for the given data. Refer the full traceback of the error below: ValueError: Could not infer node type for data: {'label': 'text_chunk', 'embedding': [0.015958545729517937, ****], 'properties': {'file_name': 'input.docx', 'file_path': 'data\\input.docx', 'file_type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'file_size': 269519, 'creation_date': '2024-08-30', 'last_modified_date': '2024-08-30', '_node_content': '{"id_": "9f4c67b8-b055-4caa-9e52-5d8ffa6d3c11", "embedding": null, "metadata": {"file_name": "input.docx", "file_path": "data\\\\input.docx", "file_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "file_size": 269519, "creation_date": "2024-08-30", "last_modified_date": "2024-08-30"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "a655e0cd-ba06-47a8-9aa1-5341cc0e7446", "node_type": "4", "metadata": {"file_name": "input.docx", "file_path": "data\\\\input.docx", "file_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "file_size": 269519, "creation_date": "2024-08-30", "last_modified_date": "2024-08-30"}, "hash": "b8565428e99f410e8069f23c8b8d5661bb90216ddc5b5b4b3bc594d7e155f404", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "1cf26f1a-2592-4d47-9ecf-93b320d47567", "node_type": "1", "metadata": {}, "hash": "e86e358e00b2bb463ccd26e8565c0afb89eac72f3e75e877772c30cb048c94a4", "class_name": "RelatedNodeInfo"}}, "text": "", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 1345, "text_template": "{metadata_str}\\n\\n{content}", "metadata_template": "{key}: {value}", "metadata_seperator": "\\n", "class_name": "TextNode"}', '_node_type': 'TextNode', 'document_id': 'a655e0cd-ba06-47a8-9aa1-5341cc0e7446', 'doc_id': 'a655e0cd-ba06-47a8-9aa1-5341cc0e7446', 'ref_doc_id': 'a655e0cd-ba06-47a8-9aa1-5341cc0e7446'}}

Hello @frontier-repository! I'm here to assist you with any bugs, questions, or contributions you may have.

I found a similar closed issue that might be relevant to your problem: [Question]: Can not load my index created locally. The suggested solution in that thread was to ensure that when using a vector database, all data is persisted in the vector store alone, and there is no need to call persist or load separately. To connect to an existing index created in a vector store like Chroma, you can use the following code:

vector_store = <points to existing ChromaVectorStore>
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

For your specific issue, the ValueError you're encountering indicates that the node type could not be inferred from the persisted data. Here are the specific requirements for the load_index_from_storage function to successfully infer the node type from the persisted data:

Storage Context: Ensure the StorageContext is correctly initialized from the persisted directory.
```
storage_context = StorageContext.from_defaults(persist_dir=output_folder)
```
Object Node Mapping: If an object_node_mapping is not provided, the function attempts to load it from the persisted directory. The function assumes SimpleObjectNodeMapping for simplicity.
```
object_node_mapping = SimpleObjectNodeMapping.from_persist_dir(persist_dir=output_folder)
```
Single Index Constraint: The storage context should contain only a single index. If multiple indices are present, the function will raise a ValueError.
Service Context: Ensure the service_context is provided to ensure that the loaded index can operate correctly.
```
new_index = load_index_from_storage(storage_context=storage_context, service_context=mock_service_context)
```
Persistence of Object Node Mapping: The object_node_mapping must be persisted correctly. If the persistence fails, a warning is issued, and the user must reconstruct the same object node mapping to build the ObjectIndex.

Given these requirements, please ensure that your persisted data meets these conditions. If the issue persists, you might want to check the structure of your node data in the persisted file. The expected structure of the node data for the PropertyGraph feature is defined by the props_schema parameter in the get_store function:

props_schema: str = "`file_path` STRING, `file_name` STRING, `file_type` STRING, `file_size` INT, `creation_date` STRING, `last_modified_date` STRING, `_node_content` STRING, `_node_type` STRING, `document_id` STRING, `doc_id` STRING, `ref_doc_id` STRING, `triplet_source_id` STRING"

Ensure that each node in your persisted data has the following properties:

file_path
file_name
file_type
file_size
creation_date
last_modified_date
_node_content
_node_type
document_id
doc_id
ref_doc_id
triplet_source_id

Additionally, the persisted file may include other metadata and relationships for each node, such as:

embedding
doc_hash
extra_info
node_info
relationships
__type__

By ensuring these conditions and structures, you should be able to load the persisted PropertyGraph data without encountering the ValueError ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Bug]: Failed to load persisted PropertyGraph data with ValueError #15798

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks