neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.54k stars 404 forks source link

[Bug] Re-process feature #878

Open msenechal opened 1 week ago

msenechal commented 1 week ago

The re-processing feature seems to not work and be stuck in a "Reprocess" status.

Steps to reproduce:

  1. Clear schema config in the graph enhancement
  2. Load any document
  3. Go to graph enhancement, Pull the schema and make some modifications
  4. click on the reprocess It stays in Reprocess forever

Backend logs: Looks like it is being stuck here: [INFO]{'api_name': 'retry_processing', 'db_url': 'neo4j+s://a77ed0fa.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'file_name': 'ms.pdf', 'retry_condition': 'delete_entities_and_start_from_beginning', 'logging_time': '2024-11-14 12:17:48 UTC'} 2024-11-14 12:17:49,039 - <src.entities.source_node.sourceNode object at 0x39effed10> Base Param value 1 : {'props': {'fileName': 'ms.pdf', 'status': 'Reprocess', 'nodeCount': 0, 'relationshipCount': 0, 'is_cancelled': False, 'processed_chunk': 0, 'retry_condition': 'delete_entities_and_start_from_beginning'}} 2024-11-14 12:17:49,039 - Update source node properties INFO: 127.0.0.1:50138 - "POST /retry_processing HTTP/1.1" 200 OK

Nothing being logged after this last line and nothing happen, the entities are not being re-generated

karanchellani commented 2 days ago

Hi @msenechal this seems intermittent issue, can you please check with some more examples.

msenechal commented 2 days ago

Hi, doesn't look intermittent on my side, it always happen on any document types, see screenshot attached, I tried on different PDFs, wikipedia, url, youtube etc I tried one re-process at a time with waiting time in between and you can see in the logs nothing happen after the Update source node properties :

image

Logs for trying the re-process on 4 files, with delays between re-process:

backend | 2024-11-21 11:09:07,768 - <src.entities.source_node.sourceNode object at 0xfffeffe38c40> backend | 2024-11-21 11:09:07,769 - Update source node properties backend | 2024-11-21 11:09:25,025 - <src.entities.source_node.sourceNode object at 0xfffeffc1e6b0> backend | 2024-11-21 11:09:25,025 - Update source node properties backend | 2024-11-21 11:13:30,225 - <src.entities.source_node.sourceNode object at 0xfffefb3882e0> backend | 2024-11-21 11:13:30,225 - Update source node properties backend | 2024-11-21 12:13:49,858 - <src.entities.source_node.sourceNode object at 0xfffeffe3a200> backend | 2024-11-21 12:13:49,860 - Update source node properties

karanchellani commented 1 day ago

"Reprocess" is a state, after this you need to click on Generate Graph Button to process the files. May be we need to change the label to "Ready to Reprocess" to avoid the confusion.

msenechal commented 1 day ago

Ahhhh yeah let's change it to either Ready to reprocess or automatically run the processing because right now it is not intuitive for users to click on generate when they already clicked on reprocess

kartikpersistent commented 1 day ago

@jexp What will be better changing the state to ready to reprocess or reprocessing immediately on click of save ??

jexp commented 20 hours ago

Change the status name. as you might want to change model, schema and reprocess multiple files