run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.75k stars 5.27k forks source link

[Question]: i add the `source` STRING to DEFAULT_PROPS_SCHEMA and add props_schema=DEFAULT_PROPS_SCHEMA into PropertyGraphIndex.from_documents, it dose not work , disply "SemanticError: Unknown column `source' in schema" as follow #16275

Open abc-w opened 1 month ago

abc-w commented 1 month ago

Question Validation

Question

i add the source STRING to DEFAULT_PROPS_SCHEMA and add props_schema=DEFAULT_PROPS_SCHEMA into PropertyGraphIndex.from_documents, it dose not work , disply "SemanticError: Unknown column `source' in schema" as follow:

from llama_index.core.indices.property_graph import PropertyGraphIndex from llama_index.core.storage.storage_context import StorageContext from llama_index.llms.openai import OpenAI DEFAULT_PROPS_SCHEMA = "file_path STRING, file_name STRING, file_type STRING, file_size INT, creation_date STRING, last_modified_date STRING, source STRING, page INT, _node_content STRING, _node_type STRING, document_id STRING, doc_id STRING, ref_doc_id STRING, triplet_source_id STRING" index = PropertyGraphIndex.from_documents( llama_index_docs, property_graph_store=graph_store, vector_store=vec_store, show_progress=True, embed_kg_nodes=False, props_schema=DEFAULT_PROPS_SCHEMA )

Exception: ('NebulaGraph query failed:', "SemanticError: Unknown column source' in schema", 'Statement:', 'INSERT VERTEX Props__ (source,page,_node_content,_node_type,document_id,doc_id,ref_doc_id`) VALUES "1d7d1875-bf0b-4ee7-acd1-00dc1e47ce76":($kv_0,$kv_1,$kv_2,$kv_3,$kv_4,$kv_5,$kv_6);', 'Params:', {'kv_0': '/data/pingchuan/pingchuan/raggraph/2401.18059v1.pdf', 'kv_1': 0, 'kv2': '{"id": "1d7d1875-bf0b-4ee7-acd1-00dc1e47ce76", "embedding": null, "metadata": {"source": "/data/pingchuan/pingchuan/raggraph/2401.18059v1.pdf", "page": 0}, "excluded_embed_metadata_keys": [], "excluded_llm_metadata_keys": [], "relationships": {"1": {"node_id": "8173a9d5-552f-4103-8469-f26e8050869d", "node_type": "4", "metadata": {"source": "/data/pingchuan/pingchuan/raggraph/2401.18059v1.pdf", "page": 0}, "hash": "e29705ed8d12c0bf347232ef0c396733796c40724b16db42cec6544516c183cc", "class_name": "RelatedNodeInfo"}}, "text": "", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 496, "text_template": "{metadata_str}\n\n{content}", "metadata_template": "{key}: {value}", "metadata_seperator": "\n", "class_name": "TextNode"}', 'kv_3': 'TextNode', 'kv_4': '8173a9d5-552f-4103-8469-f26e8050869d', 'kv_5': '8173a9d5-552f-4103-8469-f26e8050869d', 'kv_6': '8173a9d5-552f-4103-8469-f26e8050869d'})

dosubot[bot] commented 1 month ago

The error "SemanticError: Unknown column source in schema" occurs because the source column is not defined in the DEFAULT_PROPS_SCHEMA. The DEFAULT_PROPS_SCHEMA is a hard-coded schema that includes specific properties, but it does not include a source property by default.

Here is the DEFAULT_PROPS_SCHEMA as defined in the document:

DEFAULT_PROPS_SCHEMA = "`file_path` STRING, `file_name` STRING, `file_type` STRING, `file_size` INT, `creation_date` STRING, `last_modified_date` STRING, `_node_content` STRING, `_node_type` STRING, `document_id` STRING, `doc_id` STRING, `ref_doc_id` STRING, `triplet_source_id` STRING"

To resolve this error, you need to add the source property to the DEFAULT_PROPS_SCHEMA:

DEFAULT_PROPS_SCHEMA = "`file_path` STRING, `file_name` STRING, `file_type` STRING, `file_size` INT, `creation_date` STRING, `last_modified_date` STRING, `_node_content` STRING, `_node_type` STRING, `document_id` STRING, `doc_id` STRING, `ref_doc_id` STRING, `triplet_source_id` STRING, `source` STRING"

By including source in the schema, the PropertyGraphIndex.from_documents method will recognize the source column, and the "SemanticError: Unknown column source in schema" error should be resolved [1].

To continue talking to Dosu, mention @dosu.

abc-w commented 1 month ago

yes, i did this , but not work