kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.29k stars 90 forks source link

Kùzu to NetworkX fails when results of an #4284

Closed prrao87 closed 4 hours ago

prrao87 commented 4 hours ago

Kùzu version

No response

What operating system are you using?

No response

What happened?

Based on the additional issue reported in #3640.

When trying to isolate a subgraph via MATCH and an OPTIONAL MATCH clauses, we ran into an issue where if the OPTIONAL MATCH returns a null result, the conversion of the query result to NetworkX fails. Supporting this use case makes sense, as sometimes the query results can contain null values which should be ignored by the get_as_networkx() converter.

Are there known steps to reproduce?

Run the following code:

import shutil
import kuzu
import networkx

shutil.rmtree("test_db", ignore_errors=True)
db = kuzu.Database()
conn = kuzu.Connection(db)

conn.execute("CREATE NODE TABLE Person (name STRING, age INT64, PRIMARY KEY (name))")
conn.execute("CREATE NODE TABLE Product (name STRING, price DOUBLE, PRIMARY KEY (name))")
conn.execute("CREATE REL TABLE Purchased (FROM Person TO Product, quantity INT64)")

conn.execute("CREATE (p:Person {name: 'Alice', age: 30})")
conn.execute("CREATE (p:Person {name: 'Bob', age: 25})")

conn.execute("CREATE (p:Product {name: 'Laptop', price: 1000.0})")
conn.execute("CREATE (p:Product {name: 'Phone', price: 500.0})")

conn.execute("MATCH (p:Person {name: 'Alice'}), (prod:Product {name: 'Laptop'}) MERGE (p)-[:PURCHASED]->(prod)")
conn.execute("MATCH (p:Person {name: 'Alice'}), (prod:Product {name: 'Phone'}) MERGE (p)-[:PURCHASED]->(prod)")
conn.execute("MATCH (p:Person {name: 'Bob'}), (prod:Product {name: 'Phone'}) MERGE (p)-[:PURCHASED]->(prod)")

# Run the following query
result = conn.execute("""
    MATCH (a)-[p1:Purchased]->(pr1:Product {name: 'Phone'})
    OPTIONAL MATCH (a)-[p2:Purchased]->(pr2:Product {name: 'Tablet'})
    RETURN *
""")

G = result.get_as_networkx()
print(G.nodes)
print(G.edges)

Because the query's OPTIONAL MATCH clause asks for a non-existent relationship, we will receive null values in the query result, which throws the following error:

Traceback (most recent call last):
  File "/Users/prrao/code/kuzu/test.py", line 31, in <module>
    G = result.get_as_networkx()
        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prrao/code/kuzu/tools/python_api/build/kuzu/query_result.py", line 283, in get_as_networkx
    _id = row[i]["_id"]
          ~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable
prrao87 commented 4 hours ago

Tried the code above based on #4282, and it works as intended 👌🏽.

['Person_Alice', 'Product_Phone', 'Person_Bob']
[('Person_Alice', 'Product_Phone', 0), ('Person_Bob', 'Product_Phone', 0)]