kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.07k stars 77 forks source link

Floating point exception (core dumped) caused by OPTIONAL MATCH #3382

Open joyemang33 opened 3 weeks ago

joyemang33 commented 3 weeks ago

Hi here! When I test Kuzu with a query containing OPTIONAL MATCH, it raises a "Floating point exception (core dumped)" and directly lets my Python client crash.

Here is a reduced test case to reproduce the case:

MATCH (n1)<-[]-(n1)-[]->(n2) OPTIONAL MATCH (n2)-[]-(n2)-[]->(n1)-[]->(n1), (n2)-[]-(n1)-[]-(n1)-[]-(n1) RETURN COUNT (*)

The graph data can be found here.

Thank you very much for the time to investigate it!

Best regards, Qiuyang

prrao87 commented 3 weeks ago

Hi @joyemang33, thanks for posting. I'm not able to reproduce the issue in v0.1.0 of Kùzu (which I assume is the version of Kùzu that you used based on the storage layer). Could you try recreating the graph in our latest release (0.3.2) of Kùzu and see if the issue persists? Also, what OS did you run this on, and what are its memory resources? For future reference, when posting potential bugs, the Kùzu version number and OS would help us narrow down on the issue faster 😅

prrao87 commented 3 weeks ago

Tried on Kùzu 0.1.0 and it runs for me (MacOS):

import kuzu

db = kuzu.Database("G")
conn = kuzu.Connection(db)

res = conn.execute("MATCH (n1)<-[]-(n1)-[]->(n2) OPTIONAL MATCH (n2)-[]-(n2)-[]->(n1)-[]->(n1), (n2)-[]-(n1)-[]-(n1)-[]-(n1) RETURN COUNT (*)")
print(res.get_as_df())
   COUNT_STAR()
0       6250005

What are the memory resources available on your machine? I wonder if it's too low for this size of graph.

prrao87 commented 3 weeks ago

I recommend rebuilding the graph with the latest stable release (0.3.2) and letting us know if the issue persists:

pip install -U kuzu
joyemang33 commented 3 weeks ago

I recommend rebuilding the graph with the latest stable release (0.3.2) and letting us know if the issue persists:

pip install -U kuzu

Thanks for your reply! That's a little strange. My kuzu version is 0.3.2 and my OS is Ubuntu Ubuntu 20.04.6 LTS equipped with 512GB memory. I try the following code again, and the problem still occurs:

import kuzu

db = kuzu.Database("G")
conn = kuzu.Connection(db)
res = conn.execute("MATCH (n1)<-[]-(n1)-[]->(n2) OPTIONAL MATCH (n2)-[]-(n2)-[]->(n1)-[]->(n1), (n2)-[]-(n1)-[]-(n1)-[]-(n1) RETURN COUNT (*)")

Sorry for the confusion. Could you check it again in the Linux OS?

Best regards, Qiuyang

prrao87 commented 2 weeks ago

Interesting, I tried querying this graph on MacOS in v0.3.2 and it complained about the storage layer being out of date. Will ask @acquamarin to take a look on Linux. Thanks!

acquamarin commented 2 weeks ago

I could reproduce the bug on version 0.10.0 on linux. Can you share the database with storage version 0.3.2?

prrao87 commented 2 weeks ago

@acquamarin did you mean v0.1.0? I think what we need to also clarify here is that the original DB was with the old storage layer, meant for v0.1.0.

@joyemang33 could you clarify what you mean when you're saying "my Kùzu version is 0.3.2"? If you try to open the database you shared, doesn't it give you a storage incompatibility error?

I think the best way would be for you to re-ingest the data into a new database after installing v0.3.2 and share that database with us. Thanks!

acquamarin commented 2 weeks ago

yes, just confirmed that the database shared by user is created by kuzu v0.1.0 with an older storage version. In order to verify whether the bug still exists in the newest master, we need the user to recreate the database using v0.3.2.

joyemang33 commented 2 weeks ago

Sorry for any inconvenience caused! With the help of @andyfengHKU, I verify the database version is 0.1.0 but my kudzu version is 0.3.2. It's a little strange, but I will try to recreate a version of 0.3.2 asap. Thanks a lot of you all help!

joyemang33 commented 2 weeks ago

Sorry for any inconvenience caused! With the help of @andyfengHKU, I verify the database version is 0.1.0 but my kudzu version is 0.3.2. It's a little strange, but I will try to recreate a version of 0.3.2 asap. Thanks a lot of you all help!

In the 0.3.2, I found a case that can cause segmentation fault error, not sure whether it's a same bug of the 0.1.0 case:

image

It will return:

image

The Cypher query is:

MATCH (n1:L3)<-[]-(n1)-[:T1|:T2]-(n3:L3)<-[]-(n1), (n1:L9)-[]->(n1)-[:T1]-(n2)
MATCH (n2)-[]-(n2)-[]-(n4)-[]->(n3:L2) RETURN COUNT (*)

The graph data is here.

Thanks again for your time to help me investigate these cases!

acquamarin commented 2 weeks ago

I just confirmed that this query does cause a seg fault:

MATCH (n1:L3)<-[]-(n1)-[:T1|:T2]-(n3:L3)<-[]-(n1), (n1:L9)-[]->(n1)-[:T1]-(n2)
MATCH (n2)-[]-(n2)-[]-(n4)-[]->(n3:L2) RETURN COUNT (*)

@andyfengHKU Can you take a look?

prrao87 commented 2 weeks ago

Yup, that query does indeed cause a segfault: Kùzu 0.3.2, tested on macOS as well as Linux.

[1]    90329 segmentation fault  python test.py

However, the earlier query that was posted (for the 0.1.0 graph), worked and returned a count of zero (I presume that's because the data changed)

   COUNT_STAR()
0             0
joyemang33 commented 2 weeks ago

Yup, that query does indeed cause a segfault: Kùzu 0.3.2, tested on macOS as well as Linux.

[1]    90329 segmentation fault  python test.py

However, the earlier query that was posted (for the 0.1.0 graph), worked and returned a count of zero (I presume that's because the data changed)

   COUNT_STAR()
0             0

Yes, this is because the data changed I guess. Indeed, I do not have the back-up of the data-insertion statements, so I am trying to re-run the whole test program to obtain the same crash case.

Thanks again for your help!

Best regards, Qiuyang