neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.54k stars 404 forks source link

Issue with chunking + GDS library #856

Open prisciliapangg opened 1 week ago

prisciliapangg commented 1 week ago

message': 'Failed To Process File:xx.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for xx.pdf. Please re-upload file and try.', 'file_name': 'xx.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://xx.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'} Traceback (most recent call last):

and

Failed to create GDS driver: The Graph Data Science library is not correctly installed on the Neo4j server.

aashipandya commented 1 week ago

Have you tried re-uploading the file and generate graph ?

If you still get error, please share full trace of the error and pdf file if possible.

prisciliapangg commented 1 week ago

this is the document that i am working on: government-data-security-policies.pdf

[INFO]{'api_name': 'extract', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'source_url': None, 'aws_access_key_id': None, 'model': 'openai-gpt-4o', 'gcs_bucket_name': None, 'gcs_bucket_folder': None, 'source_type': 'local file', 'gcs_blob_filename': None, 'file_name': 'government-data-security-policies.pdf', 'gcs_project_id': None, 'wiki_query': None, 'allowedNodes': '', 'allowedRelationship': '', 'language': None, 'retry_condition': '', 'logging_time': '2024-11-12 02:42:17 UTC'} 2024-11-12 10:42:17,761 - File path:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf 2024-11-12 10:42:17,761 - Process file name :government-data-security-policies.pdf 2024-11-12 10:42:17,959 - Time taken database connection: 0.20 seconds 2024-11-12 10:42:18,184 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf 2024-11-12 10:42:18,184 - file government-data-security-policies.pdf deleted successfully [ERROR]{'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'} 2024-11-12 10:42:18,184 - File Failed in extraction: {'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'} Traceback (most recent call last): File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/score.py", line 193, in extract_knowledge_graph_from_file uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 226, in extract_graph_from_file_local_file return await processing_source(uri, userName, password, database, model, fileName, [], allowedNodes, allowedRelationship, True, merged_file_path, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 308, in processing_source total_chunks, chunkId_chunkDoc_list = get_chunkId_chunkDoc_list(graph, file_name, pages, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 525, in get_chunkId_chunkDoc_list raise Exception(f"Chunks are not created for {file_name}. Please re-upload file and try.") Exception: Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.

aashipandya commented 1 week ago

It is processing at our end.

image image

Try to select and delete this file from UI and then upload it again.

prisciliapangg commented 1 week ago

I tried that and it the problem still remains: INFO: 127.0.0.1:63194 - "POST /post_processing HTTP/1.1" 200 OK [INFO]{'api_name': 'delete_document_and_entities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'filenames': '["government-data-security-policies.pdf"]', 'deleteEntities': 'true', 'source_types': '["local file"]', 'logging_time': '2024-11-12 12:25:13 UTC'} 2024-11-12 20:25:14,105 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf 2024-11-12 20:25:14,419 - Deleting 1 documents = '['government-data-security-policies.pdf']' from '['local file']' from database [INFO]{'api_name': 'delete_document_and_entities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:14 UTC', 'elapsed_api_time': '0.76'} INFO: 127.0.0.1:63194 - "POST /delete_document_and_entities HTTP/1.1" 200 OK [INFO]{'api_name': 'upload', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'chunkNumber': '1', 'totalChunks': '1', 'original_file_name': 'government-data-security-policies.pdf', 'model': 'openai-gpt-4o', 'logging_time': '2024-11-12 12:25:18 UTC'} 2024-11-12 20:25:19,174 - gcs file cache: False 2024-11-12 20:25:19,174 - Chunk File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/chunks/government-data-security-policies.pdf_part_1 2024-11-12 20:25:19,175 - Merged File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files 2024-11-12 20:25:19,176 - Chunk File Path While Merging Parts:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/chunks/government-data-security-policies.pdf_part_1 2024-11-12 20:25:19,176 - Chunks merged successfully and return file size 2024-11-12 20:25:19,176 - File merged successfully 2024-11-12 20:25:19,176 - creating source node if does not exist [INFO]{'api_name': 'upload', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:19 UTC', 'elapsed_api_time': '0.52'} INFO: 127.0.0.1:63194 - "POST /upload HTTP/1.1" 200 OK [INFO]{'api_name': 'extract', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'source_url': None, 'aws_access_key_id': None, 'model': 'openai-gpt-4o', 'gcs_bucket_name': None, 'gcs_bucket_folder': None, 'source_type': 'local file', 'gcs_blob_filename': None, 'file_name': 'government-data-security-policies.pdf', 'gcs_project_id': None, 'wiki_query': None, 'allowedNodes': '', 'allowedRelationship': '', 'language': None, 'retry_condition': '', 'logging_time': '2024-11-12 12:25:21 UTC'} 2024-11-12 20:25:21,432 - File path:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf 2024-11-12 20:25:21,432 - Process file name :government-data-security-policies.pdf 2024-11-12 20:25:21,742 - Time taken database connection: 0.31 seconds 2024-11-12 20:25:21,898 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf 2024-11-12 20:25:21,899 - file government-data-security-policies.pdf deleted successfully [ERROR]{'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 12:25:21 UTC'} 2024-11-12 20:25:21,899 - File Failed in extraction: {'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 12:25:21 UTC'} Traceback (most recent call last): File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/score.py", line 193, in extract_knowledge_graph_from_file uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 226, in extract_graph_from_file_local_file return await processing_source(uri, userName, password, database, model, fileName, [], allowedNodes, allowedRelationship, True, merged_file_path, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 308, in processing_source total_chunks, chunkId_chunkDoc_list = get_chunkId_chunkDoc_list(graph, file_name, pages, retry_condition) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 525, in get_chunkId_chunkDoc_list raise Exception(f"Chunks are not created for {file_name}. Please re-upload file and try.") Exception: Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try. INFO: 127.0.0.1:63194 - "POST /extract HTTP/1.1" 200 OK INFO: 127.0.0.1:63711 - "GET /update_extract_status/government-data-security-policies.pdf?url=neo4j+s://915323c5.databases.neo4j.io:7687&userName=neo4j&password=cjY1cTZsSDUwVVA3cWdFaGxObzVIVmVULUVoRS1JRVI4dEdXTHRhSlJPbw==&database=neo4j HTTP/1.1" 200 OK [INFO]{'api_name': 'post_processing', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'tasks': '["materialize_text_chunk_similarities","enable_hybrid_search_and_fulltext_search_in_bloom","materialize_entity_similarities","enable_communities"]', 'logging_time': '2024-11-12 12:25:22 UTC'} 2024-11-12 20:25:23,381 - update KNN graph 2024-11-12 20:25:23,615 - SSE Client disconnected 2024-11-12 20:25:23,615 - Updated KNN Graph 2024-11-12 20:25:23,615 - Starting the process of creating full-text indexes. 2024-11-12 20:25:23,948 - Database connectivity verified. 2024-11-12 20:25:23,948 - Creating a full-text index for type 'entities'. 2024-11-12 20:25:23,967 - Dropped existing index (if any) in 0.02 seconds. 2024-11-12 20:25:23,997 - Full text index is not created as labels are empty 2024-11-12 20:25:23,997 - Process completed in 0.05 seconds. 2024-11-12 20:25:23,997 - Full-text index for type 'entities' created successfully. 2024-11-12 20:25:23,997 - Creating a full-text index for type 'hybrid'. 2024-11-12 20:25:24,021 - Dropped existing index (if any) in 0.02 seconds. 2024-11-12 20:25:24,050 - Created full-text index in 0.03 seconds. 2024-11-12 20:25:24,053 - Process completed in 0.06 seconds. 2024-11-12 20:25:24,053 - Full-text index for type 'hybrid' created successfully. 2024-11-12 20:25:24,053 - Creating a vector index for type 'vector'. 2024-11-12 20:25:24,053 - Starting the process to create vector index. 2024-11-12 20:25:24,069 - Dropped existing index (if any) in 0.02 seconds. 2024-11-12 20:25:24,094 - Created vector index in 0.02 seconds. 2024-11-12 20:25:24,097 - Vector index for chunk created successfully. 2024-11-12 20:25:24,097 - Driver closed successfully. 2024-11-12 20:25:24,098 - Full-text and vector index creation process completed. 2024-11-12 20:25:24,098 - Full Text index created 2024-11-12 20:25:24,174 - Entity Embeddings created 2024-11-12 20:25:24,571 - Failed to create GDS driver: The Graph Data Science library is not correctly installed on the Neo4j server. Please refer to https://neo4j.com/docs/graph-data-science/current/installation/.

2024-11-12 20:25:24,571 - Failed to create communities: The Graph Data Science library is not correctly installed on the Neo4j server. Please refer to https://neo4j.com/docs/graph-data-science/current/installation/.

2024-11-12 20:25:24,571 - created communities [DEFAULT]{'api_name': 'post_processing/create_communities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:24 UTC'}