kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.28k stars 90 forks source link

Bug: Suspicious results for MATCH query when replacing one (n1) to (n2) #3606

Open joyemang33 opened 3 months ago

joyemang33 commented 3 months ago

Kùzu version

v0.4.3.23

What happened?

First, I execute the following Cypher query on my graph:

MATCH (n1)-[]->(n1)-[]->(n1)-[]->(n1) RETURN COUNT (*)
=> 216

Then, I execute the second query in the same graph:

MATCH (n1)-[]->(n1)-[]->(n2)-[]->(n1) RETURN COUNT (*)
=> 72

I am not sure whether the first result greater than the second one is expected in Kuzu, because the second query can strictly match more patterns.

It would be highly appreciated if you could tell me if I misunderstood this case or if it was caused by a potential bug in Kuzu.

Best regards, Qiuyang

Are there known steps to reproduce?

The graph data can be imported by the statements in the attached JSON file G2_48.json

andyfengHKU commented 3 months ago

I think this is a bug because because the second query can strictly match more patterns this statement should hold.

WWW0030 commented 1 week ago

I was able to reproduce the graph data as the follow:

image

Based on the table and information, MATCH (n1)-[]->(n1)-[]->(n1)-[]->(n1) RETURN COUNT (*) should return 216 (returns 216), and MATCH (n1)-[]->(n1)-[]->(n2)-[]->(n1) RETURN COUNT (*) should return 216 as well (returns 72 instead), while something like MATCH (n1)-[]->(n2)-[]->(n2)-[]->(n2) RETURN COUNT (*) should return (12 + 6) * 36 = 648 (returns 648).

WWW0030 commented 1 week ago

I also created a simpler series of statements which reproduces the bug:

CREATE NODE TABLE L0(id INT64, p17 BOOLEAN, p18 INT64, p16 DOUBLE, p1 STRING, p13 DOUBLE, p3 BOOLEAN, p15 INT64, p4 INT64, p8 STRING, p9 DOUBLE, p12 BOOLEAN, p14 STRING, p6 DOUBLE, p11 BOOLEAN, p0 BOOLEAN, p19 STRING, p5 BOOLEAN, p2 DOUBLE, p7 STRING, p10 BOOLEAN, PRIMARY KEY(id)); 
CREATE NODE TABLE L1(id INT64, p19 STRING, p8 STRING, p0 BOOLEAN, p18 INT64, p11 BOOLEAN, p14 STRING, p13 DOUBLE, p12 BOOLEAN, p6 DOUBLE, p17 BOOLEAN, p15 INT64, p7 STRING, p1 STRING, p9 DOUBLE, p3 BOOLEAN, p5 BOOLEAN, p4 INT64, p10 BOOLEAN, p2 DOUBLE, p16 DOUBLE, PRIMARY KEY(id)); 
CREATE NODE TABLE L2(id INT64, p15 INT64, p14 STRING, p8 STRING, p0 BOOLEAN, p11 BOOLEAN, p10 BOOLEAN, p6 DOUBLE, p18 INT64, p16 DOUBLE, p17 BOOLEAN, p4 INT64, p1 STRING, p13 DOUBLE, p3 BOOLEAN, p19 STRING, p2 DOUBLE, p5 BOOLEAN, p9 DOUBLE, p7 STRING, p12 BOOLEAN, PRIMARY KEY(id)); 
CREATE NODE TABLE L3(id INT64, p6 DOUBLE, p0 BOOLEAN, p18 INT64, p5 BOOLEAN, p19 STRING, p10 BOOLEAN, p7 STRING, p14 STRING, p1 STRING, p15 INT64, p11 BOOLEAN, p12 BOOLEAN, p13 DOUBLE, p17 BOOLEAN, p8 STRING, p16 DOUBLE, p4 INT64, p9 DOUBLE, p3 BOOLEAN, p2 DOUBLE, PRIMARY KEY(id)); 
CREATE NODE TABLE L4(id INT64, p6 DOUBLE, p16 DOUBLE, p14 STRING, p2 DOUBLE, p10 BOOLEAN, p15 INT64, p19 STRING, p7 STRING, p3 BOOLEAN, p17 BOOLEAN, p11 BOOLEAN, p5 BOOLEAN, p12 BOOLEAN, p0 BOOLEAN, p18 INT64, p13 DOUBLE, p4 INT64, p9 DOUBLE, p1 STRING, p8 STRING, PRIMARY KEY(id)); 
CREATE NODE TABLE L5(id INT64, p15 INT64, p7 STRING, p16 DOUBLE, p3 BOOLEAN, p19 STRING, p8 STRING, p12 BOOLEAN, p17 BOOLEAN, p1 STRING, p14 STRING, p18 INT64, p6 DOUBLE, p5 BOOLEAN, p10 BOOLEAN, p4 INT64, p9 DOUBLE, p2 DOUBLE, p13 DOUBLE, p11 BOOLEAN, p0 BOOLEAN, PRIMARY KEY(id)); 
CREATE REL TABLE T0(FROM L0 TO L1, id INT64, p8 STRING, p13 DOUBLE, p5 BOOLEAN, p6 DOUBLE, p1 STRING, p2 DOUBLE, p17 BOOLEAN, p16 DOUBLE, p12 BOOLEAN, p14 STRING, p11 BOOLEAN, p0 BOOLEAN, p15 INT64, p10 BOOLEAN, p3 BOOLEAN, p19 STRING, p4 INT64, p9 DOUBLE, p18 INT64, p7 STRING); 
CREATE REL TABLE T1(FROM L2 TO L2, id INT64, p8 STRING, p3 BOOLEAN, p10 BOOLEAN, p16 DOUBLE, p14 STRING, p6 DOUBLE, p17 BOOLEAN, p5 BOOLEAN, p12 BOOLEAN, p9 DOUBLE, p15 INT64, p11 BOOLEAN, p7 STRING, p2 DOUBLE, p1 STRING, p4 INT64, p0 BOOLEAN, p18 INT64, p13 DOUBLE, p19 STRING); 
CREATE REL TABLE T2(FROM L4 TO L2, id INT64, p3 BOOLEAN, p14 STRING, p8 STRING, p16 DOUBLE, p18 INT64, p0 BOOLEAN, p6 DOUBLE, p10 BOOLEAN, p12 BOOLEAN, p1 STRING, p2 DOUBLE, p19 STRING, p4 INT64, p7 STRING, p15 INT64, p5 BOOLEAN, p13 DOUBLE, p9 DOUBLE, p17 BOOLEAN, p11 BOOLEAN); 
CREATE (n:L0 {id: 10, p17: false, p18: -7906308374803075246, p1: 'ccPR', p13: 357000129307453056.00000000000000000000, p3: false, p15: -6334730546550389227, p4: -4055421316127392181, p8: 'LMX1suX', p9: 3336528673857534976.00000000000000000000, p12: true, p14: 'l94fGr8deiO1tR56a', p6: -111786379686324176.00000000000000000000, p11: true, p0: true, p19: '4SX1rRW8bBOkbe', p2: -1600188621876337664.00000000000000000000, p7: 'IXfau'}); 
CREATE (n:L1 {id: 11, p19: 'dH', p8: 'w19rt2ufapUPG9LAhM', p0: true, p18: 6527323655487326399, p11: false, p14: 'oL', p13: -1143273553065266944.00000000000000000000, p12: false, p6: 796389362106618240.00000000000000000000, p17: true, p15: 1249721529597652798, p7: 'OS', p9: -1338185676365138688.00000000000000000000, p3: false, p5: true, p4: -5283601510149328759, p10: true, p2: -4507689838714535424.00000000000000000000});
CREATE (n:L2 {id: 12, p15: -3284489604748570514, p14: 'R9Noom949VLCexe9', p8: 'LcFMmpKnsC18dT6uIA0', p0: true, p11: true, p10: true, p6: 4232142829406583808.00000000000000000000, p18: -7655610578591177293, p16: -519401605201346112.00000000000000000000, p17: false, p4: -3855345257879899428, p1: 'E1pfR9c5CO7BtB5u1Y', p13: 21302263097162988.00000000000000000000, p3: true, p19: 'R2mCIGtVVerrKLVyW', p2: -332300198858014848.00000000000000000000, p5: false, p9: 174055941322998720.00000000000000000000, p7: 'qod6jFpPmTDHaKTusz'});
CREATE (n:L3 {id: 13, p6: 5378927472326991872.00000000000000000000, p0: false, p18: 3826962247094195770, p5: false, p19: 'I', p10: false, p7: 'g19uDJ', p1: 'mon3', p15: -5981658932880929426, p11: true, p12: true, p13: 6827577077638582272.00000000000000000000, p17: true, p8: '3CY2T2runmR', p4: 8524604185197609695, p9: 359532162360563776.00000000000000000000, p3: false, p2: -3406726372510934016.00000000000000000000});
CREATE (n:L4 {id: 14, p6: -6003342396494129152.00000000000000000000, p16: -1021096577605977984.00000000000000000000, p10: false, p15: 6074880128481361865, p19: 'wobh0Krddtfwbf3', p7: 'WowFtiTvF', p3: true, p17: true, p11: false, p5: false, p12: true, p0: false, p18: -3372460967252465365, p13: -11599909715008370.00000000000000000000, p4: 6239365637405931443, p9: 7620142360858557440.00000000000000000000, p1: 
'LNnT01w', p8: 'X2'});
CREATE (n:L5 {id: 15, p15: -3994369521755909457, p7: 't8oda', p16: -236634120412630080.00000000000000000000, p3: false, p19: 'AG', p8: 'DAFE7sdDGbD8vhqQH', p12: false, p17: true, p1: 'V3BDucT4i', p14: 'yPuxBxEtuo', p18: 1891059394068499568, p6: 6198571586760286208.00000000000000000000, p5: false, p4: 1215743937978874146, p9: 4871604017189517312.00000000000000000000, p2: -6082182406115405824.00000000000000000000, p13: 2773487287234283520.00000000000000000000, p11: false, p0: true});
CREATE (n:L5 {id: 16, p15: 7055800334824621148, p7: '5ndLKQX8jqdV2b7USNY', p3: false, p19: 'BgY', p8: '9fY', p12: true, p1: 'X7B', p14: '5eoIi6EzdcsBO2d', p18: -37642713849275466, p6: 1210900897110801408.00000000000000000000, p5: true, p10: false, p4: -6354219474726974553, p9: 639111763859520896.00000000000000000000, p2: -2142883497954721280.00000000000000000000, p13: 1379833992324653824.00000000000000000000, p11: false, p0: true});
MATCH (n1:L0), (n2:L1) WHERE n1.id = 10 AND n2.id = 11 CREATE (n1)-[:T0 {id: 0, p8: 'jFRSimYE4motd0e4', p5: false, p6: 1387438671629514240.00000000000000000000, p2: -727958988912680064.00000000000000000000, p17: true, p16: 251166713934973952.00000000000000000000, p12: true, p14: 'tGjuob', p11: false, p0: true, p15: 6032031522877552042, p10: true, p3: false, p19: 'QZcoOTH', p4: -2008774714910842394, p9: 4052401538011552768.00000000000000000000, p7: '5scBwt3s1o0gAkH'}]->(n2);
MATCH (n1:L0), (n2:L1) WHERE n1.id = 10 AND n2.id = 11 CREATE (n1)-[:T0 {id: 1, p8: 'GDcSB0nU', p6: -351347371240361472.00000000000000000000, p1: '2JiMIZ7', p2: -2800069025037270016.00000000000000000000, p17: true, p12: true, p0: false, p15: -4851614060680031929, p10: false, p3: false, p19: 'sF4rHI4bpyf', p4: 622156947368471992, p9: -3990226918044611072.00000000000000000000, p18: -118445743623134792, p7: '9bVyOadc5rkD'}]->(n2);
MATCH (n1:L4), (n2:L2) WHERE n1.id = 14 AND n2.id = 12 CREATE (n1)-[:T2 {id: 2, p3: true, p14: '6HATvtWVq1QMaLvYP', p8: 'dezXaGPqxW2Vo7XaVfX', p16: 3208283177025581568.00000000000000000000, p18: -2624097523748122769, p0: false, p6: -6238003496532492288.00000000000000000000, p12: true, p1: 'mdwthmmQGskEr9JqGY', p2: 2047638638598354432.00000000000000000000, p19: 'ndKL1K', p4: 3304533803175091228, p7: '6QDbygEzjGsJw', p15: -5019018623495314951, p5: false, p13: 7368717080766465024.00000000000000000000, p9: -152267419293110272.00000000000000000000, p17: true, p11: false}]->(n2);
MATCH (n1:L4), (n2:L2) WHERE n1.id = 14 AND n2.id = 12 CREATE (n1)-[:T2 {id: 3, p3: true, p14: 'zw9q', p8: 'DjaOcUtN', p16: 767726902231108992.00000000000000000000, p18: -6761000123351076207, p6: -2415694011228350976.00000000000000000000, p10: true, p12: false, p1: 'hj5Dafhbs63ujY', p2: 4080876571191409664.00000000000000000000, p19: 'IimPH270vL4kcyjTA', p4: -3493595745692455879, p7: 'qYCKcNUnxWBTvSW', p15: 
3664047550336499297, p5: false, p13: -2116649499565373184.00000000000000000000, p9: -796625582679278592.00000000000000000000, p17: false, p11: true}]->(n2);
MATCH (n1:L0), (n2:L1) WHERE n1.id = 10 AND n2.id = 11 CREATE (n1)-[:T0 {id: 6, p8: 'oWReK4PYCdFihrLr', p13: 62367962020201424.00000000000000000000, p5: false, p6: 2170777252990127104.00000000000000000000, p1: 'OVGpV5oCsrkxCienDC', p2: -5284076884946644992.00000000000000000000, p17: false, p16: -3028315053010751488.00000000000000000000, p12: false, p14: 'Osx', p11: true, p0: false, p15: 8634313899614864833, p10: false, p3: true, p19: 'm', p4: -6475364547955550643, p18: 2401396242517336446, p7: 'muqy'}]->(n2);
MATCH (n1:L0), (n2:L1) WHERE n1.id = 10 AND n2.id = 11 CREATE (n1)-[:T0 {id: 8, p8: 'yTGRhtYeM', p13: -873131337479728640.00000000000000000000, p5: false, p6: 5598622762466491392.00000000000000000000, p2: 
-7287094118882638848.00000000000000000000, p17: true, p16: 3078430688149420032.00000000000000000000, p12: true, p14: 'Qy2WBzTRlVDiLKj', p11: true, p0: true, p15: 3477261852382467520, p3: false, p19: 'UuDx', p4: 5932704016068324815, p9: 758296128732868352.00000000000000000000, p18: 133672304433936889, p7: 'yBB'}]->(n2);
MATCH (n1:L0), (n2:L1) WHERE n1.id = 10 AND n2.id = 11 CREATE (n1)-[:T0 {id: 15, p8: 'm2y08JF', p13: 2763417047233650688.00000000000000000000, p5: false, p6: 2172183433732910592.00000000000000000000, p1: 'zYgixq7A', p2: -1655394528907990272.00000000000000000000, p12: false, p14: 'vjYFmrDtd32Elp', p11: true, p0: false, p15: -5093884331651770909, p10: true, p3: true, p19: 'nwdFWcxd', p4: -7438088034793850326, p9: -4700280485415375872.00000000000000000000, p18: 8064040458157445675, p7: 'YhHHNnwhdda'}]->(n2);
MATCH (n1:L2), (n2:L2) WHERE n1.id = 12 AND n2.id = 12 CREATE (n1)-[:T1 {id: 25, p8: 'buMxfZ9MSfbmobRAU8l', p3: true, p10: true, p16: 982131087424177536.00000000000000000000, p14: 'wtSY3JBQbPsl462ntpqo', p6: -6348671673141737472.00000000000000000000, p17: true, p5: true, p12: false, p9: -4000288236113589760.00000000000000000000, p15: -4250441112784407411, p11: true, p7: 'dU2QsmTjlu2hcCVq3', p1: 'fjN6Thd6BmKERwF', p4: -1188025281174783241, p0: true, p13: 2079483968357136640.00000000000000000000, p19: 'j4zwHA1m'}]->(n2);
MATCH (n1:L2), (n2:L2) WHERE n1.id = 12 AND n2.id = 12 CREATE (n1)-[:T1 {id: 26, p8: 'dw', p3: true, p10: false, p16: 363786722211371840.00000000000000000000, p14: 'LulsthLWmV8LOhxkU', p6: 2971014607172158976.00000000000000000000, p17: false, p5: true, p12: false, p9: -1474506739412387584.00000000000000000000, p11: false, p7: '3TDLwcPowYrSyCSFA', p2: -1896704972508497152.00000000000000000000, p1: 'kSs', p4: 3457126280670188220, p0: true, p18: 4355809508667912548, p13: 1143640732173138560.00000000000000000000, p19: '1STMEsJAybt57F6'}]->(n2);
MATCH (n1:L2), (n2:L2) WHERE n1.id = 12 AND n2.id = 12 CREATE (n1)-[:T1 {id: 27, p8: 'RelQKDosf', p3: false, p10: false, p16: -4109178632448380928.00000000000000000000, p14: 'Toi9AZqX7KHNfh5', p6: 2139485808331814144.00000000000000000000, p17: true, p12: false, p11: true, p7: 'JQKuuDK6VK', p2: 383660953956050240.00000000000000000000, p1: '85sBeOHyAnpFu', p4: -6157661538861182023, p0: false, p13: -443237879444976704.00000000000000000000, p19: 'M2gOxeLuLL44Z'}]->(n2);

The resulting graph data with rel connections should look like: image

MATCH (n1)-[]->(n1)-[]->(n1)-[]->(n1) RETURN COUNT (*) should return 27, and MATCH (n1)-[]->(n1)-[]->(n2)-[]->(n1) RETURN COUNT (*) should return 27, but returns 15.

WWW0030 commented 1 week ago

It looks like the bugged MATCH is matching statements which are incorrect, and skipping over some of the correct statements. Here is a compiled return of what the match statements are finding:

The bugged output
(12L2) -[25T1]-> (12L2) -[26T1]-> (14L4) -[3T2]-> (12L2)
(12L2) -[26T1]-> (12L2) -[26T1]-> (14L4) -[3T2]-> (12L2)
(12L2) -[27T1]-> (12L2) -[26T1]-> (14L4) -[3T2]-> (12L2)
(12L2) -[25T1]-> (12L2) -[25T1]-> (14L4) -[2T2]-> (12L2)
(12L2) -[26T1]-> (12L2) -[25T1]-> (14L4) -[2T2]-> (12L2)
(12L2) -[27T1]-> (12L2) -[25T1]-> (14L4) -[2T2]-> (12L2)
(12L2) -[25T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)

The expected output
(12L2) -[25T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[26T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[27T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[25T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[27T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[25T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[26T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[25T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[26T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[27T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[25T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[27T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[25T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[26T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[26T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[25T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[26T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[27T1]-> (12L2) -[25T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[25T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[26T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[27T1]-> (12L2) -[26T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[25T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[26T1]-> (12L2) -[27T1]-> (12L2)
(12L2) -[27T1]-> (12L2) -[27T1]-> (12L2) -[27T1]-> (12L2)

The bugged MATCH statement is matching n2 to L4, which is invalid.

I also wrote a quick python script which is used to generate the above outputs, just save the JSON files in the same directory as the python script to run:

import json
bugged_json_file_path = 'bugged.JSON'
expected_json_file_path = 'expected.JSON'

with open(bugged_json_file_path, 'r') as bugged_file:
    bugged_data = json.load(bugged_file)

with open(expected_json_file_path, 'r') as expected_file:
    expected_data = json.load(expected_file)

bugged_rows = bugged_data["rows"]
print("The following are output from statement:")
print("MATCH (n1)-[n3]->(n1)-[n4]->(n2)-[n5]->(n1) RETURN n1, n2, n3, n4, n5")
print()
print("The bugged output:")
for row in bugged_rows:
    print(f"({row['n1']['id']}{row['n1']['_label']}) -[{row['n3']['id']}{row['n3']['_label']}]-> ({row['n1']['id']}{row['n1']['_label']}) -[{row['n4']['id']}{row['n4']['_label']}]-> ({row['n2']['id']}{row['n2']['_label']}) -[{row['n5']['id']}{row['n5']['_label']}]-> ({row['n1']['id']}{row['n1']['_label']})")

print()
print("The expected output:")
expected_rows = expected_data["rows"]
for row in expected_rows:
    print(f"({row['n1']['id']}{row['n1']['_label']}) -[{row['n2']['id']}{row['n2']['_label']}]-> ({row['n1']['id']}{row['n1']['_label']}) -[{row['n3']['id']}{row['n3']['_label']}]-> ({row['n1']['id']}{row['n1']['_label']}) -[{row['n4']['id']}{row['n4']['_label']}]-> ({row['n1']['id']}{row['n1']['_label']})")
ray6080 commented 1 week ago

Took an initial look at the plan, the error should be due to Intersect. Will continue on this later.