neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.55k stars 404 forks source link

RE: Running llm-graph-builder / Documentation / Debugging #842

Open stevexm opened 3 weeks ago

stevexm commented 3 weeks ago

Hi, I am excited about trying out Neo4J's LLM Graph Builder, but need help and/or documentation on running it. I am using a clone of this repo from October 20.

I have set up a free-tier Neo4J Aura account and have connected to a running instance.

I have the LLM Graph Builder running via Docker and the Frontend connects to the Aura instance: neo4j+s://878a3532.databases.neo4j.io:7687

I have everything configured to use OpenAI and Diffbot. API keys are entered etc.

I have successfully uploaded various types of files: Wikipedia pages, YouTube videos, and Stackoverflow (SO) pages.

I started testing with the example SO schema using seven uploaded SO pages. BUT, no matter what I do whenever I try generating a graph the processing fails almost immediately after beginning. This for any kind of LLM model, and set of data I utilize, and whether I set a Graph Schema or not.

Very frustrating, as I must be missing something basic. Everyone else seems to be able to generate graphs by simply uploading files and then hitting "Generate Graph".

I cannot submit any of the alerts as the displayed toast popups show for about a second and then disappear. Looking at the Docker Logs and there are many, many warnings/errors like:

2024-11-02 12:33:37 backend   | 2024-11-02 19:33:37,536 - Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownRelationshipTypeWarning} {category: UNRECOGNIZED} {title: The provided relationship type is not in the database.} {description: One of the relationship types in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing relationship type is: IN_COMMUNITY)} {position: line: 45, column: 47, offset: 1162} 

2024-11-02 12:33:37 backend   | 2024-11-02 19:33:37,536 - Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.AggregationSkippedNull} {category: UNRECOGNIZED} {title: The query contains an aggregation function that skips null values.} {description: null value eliminated in set function.} 

2024-11-02 12:33:37 backend   | 2024-11-02 19:33:37,536 - Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownRelationshipTypeWarning} {category: UNRECOGNIZED} {title: The provided relationship type is not in the database.} {description: One of the relationship types in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing relationship type is: HAS_ENTITY)} 

2024-11-02 12:33:37 backend   | 2024-11-02 19:33:37,536 - Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} 

Any advice to share, documentation that I can read, examples I can run without fail?

Thanks in advance, with much appreciation.

Steve

kartikpersistent commented 2 weeks ago

Hi @stevexm can you share the error message that you are getting while processing the file

stevexm commented 1 week ago

Hi Kartikpersistent: thank you for your help and many apologies for my delay in replying. Time has gotten away from me on this project.

The error message that comes up on the UI when I try generating a graph with Wikipedia source: https://en.wikipedia.org/wiki/Matthew_Sands

Failed To Process File: Matthew_Sands or LLM Unable to Parse Content 'NoneType' object has no attribute split

The same is true for any Wikipedia pages I try generating a graph for, and for YouTube sources like: https://www.youtube.com/watch?v=KmoCnTuhMEg

More specifically, here is a portion of what Docker reports:

2024-11-14 11:53:24 backend | 2024-11-14 19:53:24,159 - File Failed in extraction: {'message': 'Failed To Process File:KmoCnTuhMEg or LLM Unable To Parse Content ', 'error_message': "'NoneType' object has no attribute 'split'", 'file_name': 'KmoCnTuhMEg', 'status': 'Failed', 'db_url': 'neo4j+s://878a3532.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'youtube', 'source_url': 'https://www.youtube.com/watch?v=KmoCnTuhMEg', 'wiki_query': None, 'logging_time': '2024-11-14 19:53:24 UTC'} 2024-11-14 11:53:24 backend | Traceback (most recent call last): 2024-11-14 11:53:24 backend | File "/code/score.py", line 202, in extract_knowledge_graph_from_file 2024-11-14 11:53:24 backend | uri_latency, result = await extract_graph_from_file_youtube(uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition) 2024-11-14 11:53:24 backend | File "/code/src/main.py", line 258, in extract_graph_from_file_youtube 2024-11-14 11:53:24 backend | return await processing_source(uri, userName, password, database, model, file_name, pages, allowedNodes, allowedRelationship) 2024-11-14 11:53:24 backend | File "/code/src/main.py", line 370, in processing_source 2024-11-14 11:53:24 backend | node_count,rel_count,latency_processed_chunk = await processing_chunks(selected_chunks,graph,uri, userName, password, database,file_name,model,allowedNodes,allowedRelationship,node_count, rel_count) 2024-11-14 11:53:24 backend | File "/code/src/main.py", line 461, in processing_chunks 2024-11-14 11:53:24 backend | graph_documents = await get_graph_from_llm(model, chunkId_chunkDoc_list, allowedNodes, allowedRelationship) 2024-11-14 11:53:24 backend | File "/code/src/llm.py", line 193, in get_graph_from_llm 2024-11-14 11:53:24 backend | llm, model_name = get_llm(model) 2024-11-14 11:53:24 backend | File "/code/src/llm.py", line 48, in get_llm 2024-11-14 11:53:24 backend | model_name, api_key = env_value.split(",") 2024-11-14 11:53:24 backend | AttributeError: 'NoneType' object has no attribute 'split'

Sidenote, everything works just fine when utilizing the Neo4j online demo.

Again my apologies for taking so long to get back to you.

Kind regards, Steve