neo4j / graph-data-science-client

A Python client for the Neo4j Graph Data Science (GDS) library
https://neo4j.com/product/graph-data-science/
Apache License 2.0
172 stars 44 forks source link

gds.graph.construct replace existing projection #655

Open Mintactus opened 1 month ago

Mintactus commented 1 month ago

For testing purposes, having a replace parameter for gds.graph.construct would be really great. Set to false by default, when true the existing projection is replaced if there is one.

Mats-SX commented 1 month ago

Hello @Mintactus and thank you for this feature request!

While we understand how this parameter could make sense in some use cases, we want to point out that it is very easy to drop a projected graph. There are multiple ways to do this with minimal Python code:

Use the graph as a context manager:

with gds.graph.construct('g', nodes, rels) as G:
    result = gds.wcc.stream(G)

# G is dropped when the context ends

Use the drop() method on the graph object:

G = gds.graph.construct('g', nodes, rels)
result = gds.wcc.stream(G)
G.drop()

Use the gds.graph.drop() endpoint as a post-cleanup:

G = gds.graph.construct('g', nodes, rels)
result = gds.wcc.stream(G)
gds.graph.drop(G)

Use the gds.graph.drop() endpoint as a pre-cleanup:

gds.graph.drop('g', failIfMissing=False)
G = gds.graph.construct('g', nodes, rels)
result = gds.wcc.stream(G)

I hope these alternatives will cover your use cases without the need for an additional parameter.

All the best Mats

Mintactus commented 1 month ago

Since I'm building the graph from a dataframe, and running the same ETL process multiple times, I can't use these options because the graph object doesn't exist yet ( unless I create it using get) , but the projection does because of the previous iteration.

It looks like this...

create_markov_chain_nodes() create_markov_chain_relationships() drop_the_existing_projection() project_markov_chain_graph() <- Just do it! Don't care about existing data

In general, I would say it's really great to have Idempotent workflows for massive tests purpose

Mats-SX commented 1 month ago

As I showed above, it is not necessary to have a graph object to drop the graph. You can use just the graph name. If you do not know the graph name (because it is randomly generated or something), you can drop all graphs by using gds.graph.list() followed by gds.graph.drop().

As far as I understand, the workflow you show with the drop_the_existing_projection() is an idempotent setup already. Am I missing something?

Mats