kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.2k stars 85 forks source link

Bug: Issue with Kuzu to Networkx #3640

Closed Kamalika17 closed 1 week ago

Kamalika17 commented 1 month ago

What happened?

Hi,

I am using the kuzu to networkx functionality: res = conn.execute('MATCH (u:User)-[r:Rating]->(m:Movie) RETURN u, r, m LIMIT 250') G = res.get_as_networkx(directed=False)

When I use this for the graph I created, I noticed that it does not support multiple edges between the same nodes (say A and B have a connection between them with condition X and Y). eg: It removes (A)-[rel: a2b {category:Y}]->(B) and only keeps (A)-[rel: a2b {category:X}]->(B).

Is there a way to resolve this?

Thanks in advance!

prrao87 commented 1 month ago

Hi @Kamalika17, could you please upload a zip file of the Kùzu database that replicates this issue? We'll take a look and get back to you. Thanks!

Kamalika17 commented 1 month ago

Hi @prrao87 ,

Test_folder.zip

I have uploaded a "Test_folder" containing my code in jupyter notebook and an example of the issue I mentioned. The query result differs from the networkx graph object. The edge "Adam to Waterloo" should be there twice with different properties. However, it removes one of the edges when adding to the networkx graph object.

Please let me know how to resolve this.

Regards, Kamalika

Kamalika17 commented 1 month ago

Hi @prrao87

I check the code given in the QueryResult.get_as_networkx Python API The graph object is not a multigraph. nx_graph = nx.DiGraph() if directed else nx.Graph() A multigraph or multiDiGraph requires the networkx object to be declared as: nx.MultiGraph() or nx.MultiDiGraph()

prrao87 commented 1 month ago

Hi @Kamalika17, apologies for the delayed response. Yes, indeed, the reason you mentioned seems to be why multiple edges aren't being stored. We will look into this some more and make the necessary updates to the API. Thanks for reporting!

prrao87 commented 1 month ago

Hi @Kamalika17, just reaching out again to get a better understanding of your requirements for MultiGraph. Because it uses a list of dicts of dicts of dicts, it does add complexity to the transformation from the underlying Kùzu query result to this format. What graph algorithms would you be using such a NetworkX graph with? Would you be using weighted edges or nodes? This might require a bit more tweaking and testing to ensure that the underlying graph in Kuzu is properly represented, and that we are able to read the data back from NetworkX to Kuzu downstream. Any additional information on your use case would help. Thanks!

kamalikaray commented 1 month ago

Hi @prrao87

We haven't thought about algorithms to apply on this graph yet. However, multigraphs are a valuable tool to represent interactions in biological systems, social networks (eg: two people can be connected in more than one way, helps to infer communities or information diffusion) more accurately than simpler graph structures. In our graph we have different edge properties. Depending on the number of edges between the same two entities, we plan to assign edge weights. Moreover, it becomes a bit misleading if KUZU as a GraphDB supports multigraph and it is not reflected when it is converted to a networkx object. As I had mentioned, the pandas data frame provides the correct result which is not shown in the networkx object.

prrao87 commented 1 month ago

@Kamalika17 understood - we were discussing about making the MultiGraph the default option for the get_as_networkx method to avoid this issue. We'll prioritize accordingly and make the fix. Again, thanks a lot for reporting!