neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.25k stars 351 forks source link

The application cannot be used after it is successfully started #786

Open BishenCheng opened 2 weeks ago

BishenCheng commented 2 weeks ago

The application cannot be used after it is successfully started, I used Ollama and tried to analyze a.pdf. After I selected it, the backend Log showed that the connection to ollama was successful, and then this error occurred. I tried to modify it, but it still did not work. The Backend Full Log is here: 2024-10-06 20:44:04,629 - file OreillyGraphDatabases.pdf deleted successfully {'message': 'Failed To Process File:OreillyGraphDatabases.pdf or LLM Unable To Parse Content ', 'error_message': 'string indices must be integers', 'file_name': 'OreillyGraphDatabases.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://90e8b6c0.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-10-06 12:44:04 UTC'} 2024-10-06 20:44:04,629 - File Failed in extraction: {'message': 'Failed To Process File:OreillyGraphDatabases.pdf or LLM Unable To Parse Content ', 'error_message': 'string indices must be integers', 'file_name': 'OreillyGraphDatabases.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://90e8b6c0.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-10-06 12:44:04 UTC'} Traceback (most recent call last): File "D:\GraphRAG Project\llm-graph-builder-main\backend\score.py", line 170, in extract_knowledge_graph_from_file result = await asyncio.to_thread( File "C:\Users\KATVR\AppData\Local\Programs\Python\Python39\lib\asyncio\threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "C:\Users\KATVR\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\thread.py", line 52, in run result = self.fn(*self.args, *self.kwargs) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\main.py", line 197, in extract_graph_from_file_local_file return processing_source(graph, model, file_name, pages, allowedNodes, allowedRelationship, True, merged_file_path, uri) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\main.py", line 312, in processing_source node_count,rel_count = processing_chunks(selected_chunks,graph,file_name,model,allowedNodes,allowedRelationship,node_count, rel_count) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\main.py", line 369, in processing_chunks graph_documents = generate_graphDocuments(model, graph, chunkId_chunkDoc_list, allowedNodes, allowedRelationship) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\generate_graphDocuments_from_llm.py", line 32, in generate_graphDocuments graph_documents = get_graph_from_OpenAI(model, graph, chunkId_chunkDoc_list, allowedNodes, allowedRelationship) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\openAI_llm.py", line 34, in get_graph_from_OpenAI return get_graph_document_list(llm, combined_chunk_document_list, allowedNodes, allowedRelationship, File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\llm.py", line 212, in get_graph_document_list graph_document = future.result() File "C:\Users\KATVR\AppData\Local\Programs\Python\Python39\lib\concurrent\futures_base.py", line 433, in result return self.get_result() File "C:\Users\KATVR\AppData\Local\Programs\Python\Python39\lib\concurrent\futures_base.py", line 389, in get_result raise self._exception File "C:\Users\KATVR\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\thread.py", line 52, in run result = self.fn(self.args, **self.kwargs) File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\graph_transformers\llm.py", line 658, in convert_to_graph_documents return [self.process_response(document) for document in documents] File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\graph_transformers\llm.py", line 658, in return [self.process_response(document) for document in documents] File "D:\GraphRAG Project\llm-graph-builder-main\backend\src\graph_transformers\llm.py", line 610, in process_response nodes_set.add((rel["head"], rel["head_type"])) TypeError: string indices must be integers 2024-10-06 20:44:04,866 - closing connection for extract api INFO: 127.0.0.1:58753 - "POST /extract HTTP/1.1" 200 OK 2024-10-06 20:44:08,308 - update KNN graph 2024-10-06 20:44:08,308 - update KNN graph 2024-10-06 20:44:09,553 - Request disconnected 2024-10-06 20:44:09,553 - Request disconnected {'api_name': 'post_processing/update_similarity_graph', 'db_url': 'neo4j+s://90e8b6c0.databases.neo4j.io:7687', 'logging_time': '2024-10-06 12:44:09 UTC'} {'api_name': 'post_processing/update_similarity_graph', 'db_url': 'neo4j+s://90e8b6c0.databases.neo4j.io:7687', 'logging_time': '2024-10-06 12:44:09 UTC'} 2024-10-06 20:44:09,555 - Updated KNN Graph 2024-10-06 20:44:09,555 - Updated KNN Graph 2024-10-06 20:44:09,555 - Starting the process of creating a full-text index. 2024-10-06 20:44:11,045 - Database connectivity verified. 2024-10-06 20:44:09,555 - Starting the process of creating a full-text index. 2024-10-06 20:44:11,045 - Database connectivity verified. 2024-10-06 20:44:11,147 - Dropped existing index (if any) in 0.10 seconds. 2024-10-06 20:44:11,147 - Dropped existing index (if any) in 0.10 seconds. 2024-10-06 20:44:11,152 - Received notification from DBMS server: {severity: INFORMATION} {code: Neo.ClientNotification.Schema.IndexOrConstraintDoesNotExist} {category: SCHEMA} {title: DROP INDEX entities IF EXISTS has no effect.} {description: entities does not exist.} {position: None} for quer2024-10-06 20:44:11,152 - Received notification from DBMS server: {severity: INFORMATION} {code: Neo.ClientNotification.Schema.IndexOrConstraintDoesNotExist} {category: SCHEMA} {title: DROP INDEX entities IF EXISTS has no effect.} {description: entities does not exist.} {position: None} for query: 'DROP INDEX entities IF EXISTS;' 2024-10-06 20:44:11,636 - Fetched labels in 0.49 seconds. 2024-10-06 20:44:12,119 - Failed to create full-text index: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input ')': expected an identifier (line 1, column 39 (offset: 38)) 2024-10-06 20:44:12,119 - Failed to create full-text index: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input ')': expected an identifier (line 1, column 39 (offset: 38)) "CREATE FULLTEXT INDEX entities FOR (n:) ON EACH [n.id, n.description];" ^} "CREATE FULLTEXT INDEX entities FOR (n:) ON EACH [n.id, n.description];" ^} 2024-10-06 20:44:12,120 - Driver closed. 2024-10-06 20:44:12,120 - Driver closed. 2024-10-06 20:44:12,120 - Process completed in 2.57 seconds. 2024-10-06 20:44:12,120 - Process completed in 2.57 seconds. {'api_name': 'post_processing/create_fulltext_index', 'db_url': 'neo4j+s://90e8b6c0.databases.neo4j.io:7687', 'logging_time': '2024-10-06 12:44:12 UTC'} 2024-10-06 20:44:12,120 - Full Text index created 2024-10-06 20:44:12,344 - Failed to write data to connection ResolvedIPv4Address(('34.126.161.242', 7687)) (ResolvedIPv4Address(('34.126.161.242', 7687))) 2024-10-06 20:44:12,345 - Failed to write data to connection IPv4Address(('90e8b6c0.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('34.126.161.242', 7687))) 2024-10-06 20:44:12,349 - closing connection for post_processing api INFO: 127.0.0.1:58753 - "POST /post_processing HTTP/1.1" 200 OK

BishenCheng commented 2 weeks ago

image

yl950218 commented 1 week ago

I encountered the same problem, why does the program delete the loaded pdf file

jexp commented 1 week ago
jexp commented 1 week ago

@vasanthasaikalluri can we fix the fulltext index creation if we have no labels to not create a fulltext index.

In the future when we have guaranteed __Entity__ labels we can switch to that fulltext index but until then.