Error: Binder exception: Read after update is not supported. Try query with multiple statements.

sapalli2989 commented 1 year ago

(Follow-up of https://github.com/kuzudb/kuzu/issues/2300)

Contrived example:

kuzu> MATCH (t:T {id: 1}) DELETE t WITH 2 as i MATCH (t2:T {id: i}) DELETE t2;
Error: Binder exception: Read after update is not supported. Try query with multiple statements.

This query works as is with Neo4j.

Not sure about the error message - isn't the purpose of WITH to separate query parts, so there isn't any read operation directly following after update/delete? Would be curious to hear, if this is this going to be supported.

To add some context: In my case there are multiple optional, somehow dependent nodes or relations, which are to be deleted, if they exist. I used multiple MATCH ... DELETE constructs for this purpose (also to omit error in #2300). These deletes are connected by WITH to keep logic in one query, as there are dependent parts - I don't want to use multiple statements here. A corresponding query structure with multiple WITH and MERGE (instead of DELETE) can be found here (page bottom) for reference.

andyfengHKU commented 1 year ago

Hi @sapalli2989,

For isn't the purpose of WITH to separate query parts, so there isn't any read operation directly following after update/delete?

My understanding of WITH is to pipe the result from previous query part to the next query part. In your case, MATCH (t2:T {id: i}) DELETE t2 will always be executed regardless of whether MATCH (t:T {id: 1}) DELETE t deletes a t or not. So I think the query is equivalent to MATCH (t:T {id: 1}) DELETE t; + MATCH (t2:T {id: 2}) DELETE t2;. On the other hand, if you have some cases like MATCH (t:T) DELETE t WITH t.id as ID MATCH (t2) WHERE t2.ID > ID DELETE t2 then the two query parts are indeed dependent and you cannot write them as two statements.

A corresponding query structure with multiple WITH and MERGE (instead of DELETE) can be found here (page bottom) for reference.

MERGE(y:age{name:"LessThanFifty"});//Create an Age group node
MERGE(o:age{name:"MoreThanFifty"});//Create an Age group node
MATCH(m:Movie{title:'A Few Good Men'}) //Match the few Good Men Movie
WITH m
    MATCH (p1:Person)-[:ACTED_IN]->(m) WHERE p1.born <= 1972
    MERGE(p1)-[:OF_AGE]->(o) 
WITH m
    MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE p2.born > 1972
    MERGE(p2)-[:OF_AGE]->(y)

In this example, the last two query parts are actually dependent because if MATCH (p1:Person)-[:ACTED_IN]->(m) WHERE p1.born <= 1972 finds no tuple, MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE p2.born > 1972 will get zero input and thus not being executed.

Would be curious to hear, if this is this going to be supported.

The answer is yes. We didn't support read after update for simplicity because we SQL standard also doesn't allow read after update in one statement. So we thought this is not an important use case. Since now there is user demand, we certainly can implement this. Though different from bug-fix, this is a relatively big feature and we should take a few weeks to implement and test.

Let me know if you have further questions.

sapalli2989 commented 1 year ago

Hi @andyfengHKU,

On the other hand, if you have some cases like MATCH (t:T) DELETE t WITH t.id as ID MATCH (t2) WHERE t2.ID > ID DELETE t2 then the two query parts are indeed dependent and you cannot write them as two statements.

Yes, exactly. Your example and the one from Neo4j docs are better than mine, as they express those dependencies I have between query parts. The dependent pars are why I cannot easily split it up in multiple statements.

Let me give you an example, which tries to reflect my case a bit more: Imagine a strict hierarchical relation named :AFTER for nodes T as part of a bigger graph. I think its actually helpful to have the model of a linked list of programming in mind here (but this list does not need to be strictly continuous). Now we want to select a specific node and move it AFTER some target node. In addition affected parts need to be reconnected, so no "holes" are created.

Minimal example: Extract t2 and move it after t6, so there is t2-[:AFTER]->t6.

We need to drop e1,e2, e4 and "reconnect" t1 with t3, t5 with t2, lastly t2 with t6, which is our actual goal.

This is a quite error-prone operation without graph features. But Cypher seems to be able to solve this operation very elegantly. I used following query in Neo4j successfully - it doesn't work with Kuzu yet:

WITH "t2" as selected_id, "t6" as move_after_id
MATCH (s:T {id: selected_id})
OPTIONAL MATCH (a:T {id: move_after_id})
OPTIONAL MATCH (t1:T)-[e1:AFTER]->(s)
OPTIONAL MATCH (s)-[e2:AFTER]->(t3:T)
OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a)
DELETE e1,e2,e4
CREATE (s)-[:AFTER]->(a)
WITH t1,t3,t5,s MATCH(t1) MATCH(t3) CREATE (t1)-[:AFTER]->(t3)
WITH t5,s MATCH(t5) CREATE (t5)-[:AFTER]->(s);

(still in my early learning phase with Cypher, there might be a better solution)

OPTIONAL MATCH ... DELETE wasn't possible due to #2293. Tried to workaround via #2300 and ultimately landed here 🙂
Potentially not-existent nodes need to be wrapped with WITH...MATCH for CREATE operations as described in linked Neo4j article (not using any internal flags alernatives)
OPTIONAL MATCH is needed, because some of the relations might not exist. For example t2 could be the very start of a linked list, so there is no relation (t1)-[:AFTER]->(t2). But we want to continue here, just ignoring non-existent relations as noop.
a might be null from query input perspective, hence OPTIONAL MATCH is used here as well.

If I feed query with above sample data:

CREATE (t1:T {id: "t1"})-[e1:AFTER {id: "e1"}]->(t2:T {id: "t2"})-[e2:AFTER {id: "e2"}]->(t3:T {id: "t3"}), (t4:T {id: "t4"})-[e3:AFTER {id: "e3"}]->(t5:T {id: "t5"})-[e4:AFTER {id: "e4"}]->(t6:T {id: "t6"})-[e5:AFTER {id: "e5"}]->(t7:T {id: "t7"})

, the result is as expected - not tested thoroughly though:

The answer is yes.

Not 42? 😂 Joking aside, happy to hear this is on the roadmap! From my own learning experience, I mostly tested queries with Neo4j web explorer first and then tried to execute it with Kuzu CLI. Assuming many folks do this, it probably makes sense for Neo4j -> Kuzu to be "copy and paste" query-wise.

So we thought this is not an important use case. Since now there is user demand, we certainly can implement this.

Hope to have given a suitable example case that actually makes sense. Thanks much for your time!

andyfengHKU commented 1 year ago

@sapalli2989 The example you gave make a lot of sense to me. I'll assign this issue to one of my colleagues to make sure Kùzu can run the example properly. I don't have an ETA for the fix now but certainly we will make it available in our November release.

From my own learning experience, I mostly tested queries with Neo4j web explorer first and then tried to execute it with Kuzu CLI.

FYI. We are releasing a Web explorer this week as well. Will announce in the slack when releasing.

it probably makes sense for Neo4j -> Kuzu to be "copy and paste" query-wise

Yes copy-paste is our ultimate goal. Currently there are still a few features that are missing but we should cover the majority by the end of this year. Though Kùzu will always ask for DDL because we believe this is the best practice for query performance and data integrity.

sapalli2989 commented 1 year ago

Wonderful. Thank you, sir!

sapalli2989 commented 11 months ago

Hi @andyfengHKU. Is there already any development branch or patch, where I could help with testing?

andyfengHKU commented 11 months ago

Hi @sapalli2989, we actually need a bit more time to handle read after update due to some edge cases. Though for your use case, I have added a test case that does not require read after update.

-CASE 2303
-STATEMENT CREATE NODE TABLE T (id STRING, PRIMARY KEY(id));
---- ok
-STATEMENT CREATE REL TABLE AFTER (FROM T TO T, id STRING);
---- ok
-STATEMENT CREATE (t1:T {id: "t1"})-[e1:AFTER {id: "e1"}]->(t2:T {id: "t2"})-[e2:AFTER {id: "e2"}]->(t3:T {id: "t3"}),
        (t4:T {id: "t4"})-[e3:AFTER {id: "e3"}]->(t5:T {id: "t5"})-[e4:AFTER {id: "e4"}]->(t6:T {id: "t6"})-[e5:AFTER {id: "e5"}]->(t7:T {id: "t7"})
---- ok
-STATEMENT MATCH p = (a:T {id:"t4"})-[e*]->(b:T {id:"t7"}) RETURN COUNT(*);
---- 1
1
-STATEMENT MATCH (s:T {id: "t2"}), (a:T {id: "t6"}),
           (t1:T)-[e1:AFTER]->(s),
           (s)-[e2:AFTER]->(t3:T),
           (t5:T)-[e4:AFTER]->(a)
           DELETE e1,e2,e4
           CREATE (s)-[e7:AFTER]->(a)
           CREATE (t1)-[e8:AFTER]->(t3)
           CREATE (t5)-[e9:AFTER]->(s);
---- ok
-STATEMENT MATCH (a)-[e]->(b) RETURN COUNT(*);
---- 1
5
-STATEMENT MATCH p = (a:T {id:"t4"})-[e*]->(b:T {id:"t7"}) RETURN COUNT(*);
---- 1
1

Seems that you just need multiple update rather than read after update. Let me know if the above solution works.

I'll also give a bit more details regarding read after update. We are solving an edge case where update creates dependency between query parts. E.g. MATCH (a:Person) SET a.age = 21 WITH * MATCH (b:Person) RETURN b.age Normally we assume MATCH (a:Person) and MATCH (n:Person) can be executed in arbitrary order. But with update, MATCH (a:Person) needs to be executed first so that MATCH (b:Person) can read the updated value.

sapalli2989 commented 11 months ago

we actually need a bit more time to handle read after update due to some edge cases.

No problem, take your time.

Though for your use case, I have added a test case that does not require read after update.

Let me know if the above solution works.

OK, let's modify the example slightly.

Example 2

Given original start data

, I'd now like to select t1 (instead of t2) and move it :AFTER t6 like before. Result should look like:

Your query wouldn't do anything here, because the whole query is aborted with MATCH, if one condition isn't fulfilled. Here condition (t1:T)-[e1:AFTER]->(s) isn't fulfilled, as t1 is right at the start and there isn't any node before.

Assuming, the actual location of selected node is unknown, I'd like to keep query as generic as possible - so that we don't need to care about its location! For now let's revert to OPTIONAL MATCH:

MATCH (s:T {id: "t1"}), (a:T {id: "t6"})
OPTIONAL MATCH (t0:T)-[e0:AFTER]->(s)
OPTIONAL MATCH (s)-[e1:AFTER]->(t2:T)
OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a)
DELETE e0,e1,e4
CREATE (s)-[:AFTER]->(a)
WITH t0,t2,t5,s MATCH(t0) MATCH(t2) CREATE (t0)-[:AFTER]->(t2)
WITH t5,s MATCH(t5) CREATE (t5)-[:AFTER]->(s);

Neo4j is almost there:

But relation t5-[:AFTER]->t1 is still missing. By looking at:

WITH t0,t2,t5,s MATCH(t0) MATCH(t2) CREATE (t0)-[:AFTER]->(t2) // (1)
WITH t5,s MATCH(t5) CREATE (t5)-[:AFTER]->(s); // (2)

, MATCH again seems to abort the query eagerly in (1), as t0 does not exist. So WITH ... in (2) is not executed anymore.

What would be ideal

I guess the ideal query can use OPTIONAL MATCH in combination with CREATE, as follows:

MATCH (s:T {id: "t1"}), (a:T {id: "t6"})
OPTIONAL MATCH (t0:T)-[e0:AFTER]->(s)
OPTIONAL MATCH (s)-[e1:AFTER]->(t2:T)
OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a)
DELETE e0,e1,e4
CREATE (s)-[:AFTER]->(a), (t0)-[:AFTER]->(t2), (t5)-[:AFTER]->(s);

most declarative - doesn't bother about technical interplay between OPTIONAL MATCH, CREATE and WITH
good readability
even eliminates read after update construct

(Below is my previous idea, but I find it less than ideal)

Previous idea

```cypher MATCH (s:T {id: "t1"}), (a:T {id: "t6"}) OPTIONAL MATCH (t0:T)-[e0:AFTER]->(s) OPTIONAL MATCH (s)-[e1:AFTER]->(t2:T) OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a) DELETE e0,e1,e4 CREATE (s)-[:AFTER]->(a) WITH t0,t2,t5,s OPTIONAL MATCH(t0) OPTIONAL MATCH(t2) CREATE (t0)-[:AFTER]->(t2) WITH t5,s OPTIONAL MATCH(t5) CREATE (t5)-[:AFTER]->(s); ```

Neo4j currently doesn't support OPTIONAL MATCH ... CREATE, as reported in above Neo4j article and errors with

Failed to create relationship UNNAMED1, node t0 is missing. If you prefer to simply ignore rows where a relationship node is missing, set 'dbms.cypher.lenient_create_relationship = true' in neo4j.conf

For me, being able to query exactly these complex graph cases (imagine model of linked list again) would be one of the main advantages compared to SQL. Is there a chance to get this supported in Kuzu?

Sorry, if this feature request should go in yet another direction. Sometimes ideas need time to grow.

I'll also give a bit more details regarding read after update.

Thanks! I stay tuned :-)

andyfengHKU commented 11 months ago

I see your point. Though I think you can achieve what you want with just OPTIONAL MATCH + UPDATE. I suggest the following

MATCH (s:T {id: <input>}), (a:T {id: <target>})
           OPTIONAL MATCH (t1:T)-[e1:AFTER]->(s)
           OPTIONAL MATCH (s)-[e2:AFTER]->(t3:T)
           OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a)
           DELETE e1,e2,e4
           CREATE (s)-[e7:AFTER]->(a)
           CREATE (t1)-[e8:AFTER]->(t3)
           CREATE (t5)-[e9:AFTER]->(s);

Optional match will guarantee query not abort during match state. If certain pattern is not matched, e.g. (t1:T)-[e1:AFTER]->(s), then t1 and e1 will be NULL, and DELETE e1 & CREATE (t1)-[e8:AFTER]->(t3) will automatically be ignored in Kùzu.

I'll attach the full statements I ran in KuzuExplorer.

-STATEMENT CREATE NODE TABLE T (id STRING, PRIMARY KEY(id));

-STATEMENT CREATE REL TABLE AFTER (FROM T TO T, id STRING);

-STATEMENT CREATE (t1:T {id: "t1"})-[e1:AFTER {id: "e1"}]->(t2:T {id: "t2"})-[e2:AFTER {id: "e2"}]->(t3:T {id: "t3"}),
        (t4:T {id: "t4"})-[e3:AFTER {id: "e3"}]->(t5:T {id: "t5"})-[e4:AFTER {id: "e4"}]->(t6:T {id: "t6"})-[e5:AFTER {id: "e5"}]->(t7:T {id: "t7"})

-STATEMENT MATCH (s:T {id: "t1"}), (a:T {id: "t6"})
           OPTIONAL MATCH (t1:T)-[e1:AFTER]->(s)
           OPTIONAL MATCH (s)-[e2:AFTER]->(t3:T)
           OPTIONAL MATCH (t5:T)-[e4:AFTER]->(a)
           DELETE e1,e2,e4
           CREATE (s)-[e7:AFTER]->(a)
           CREATE (t1)-[e8:AFTER]->(t3)
           CREATE (t5)-[e9:AFTER]->(s);

Before updating

After updating

sapalli2989 commented 11 months ago

Ah great. My bad, I had the assumption, above query wouldn't run in Kuzu yet. But you're right, I used latest Kuzu DB + explorer and indeed this works - better than Neo4j!

andyfengHKU commented 7 months ago

Read after update should now be supported in #3126

kuzudb / kuzu

Error: Binder exception: Read after update is not supported. Try query with multiple statements. #2303

Example 2

What would be ideal