neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
641 stars 161 forks source link

gds.degree.mutate overwrite/replace #314

Open Mintactus opened 5 months ago

Mintactus commented 5 months ago

An overwrite/replace parameter would be great to have, when testing multiple times in a row, you need that. When set to True, the existing degree centrality property is replaced by the new one

adamnsch commented 5 months ago

Hi @Mintactus,

Thank you for the feature request.

Would it work for you to call gds.graph.nodeProperties.drop between your calls to gds.degree.mutate? Or is there a reason why it needs to be a single procedure call?

Regards, Adam

Mintactus commented 5 months ago

Thanks Adam, I'm working on an ETL project who transform tabular business data for marketing analysis into arrow tables for GDS, The process is like this, and I'm rerunning it completly multiple times in minutes.

etl = ETL() etl.create_markov_chain_nodes() etl.create_markov_chain_relationships() etl.drop_graph_projection('arrow') <- It does work but does not make sens for a consecutive testing work flow etl.project_markov_chain_graph() <- Another ticket I submitted ( Just create it and replace if there is any existing projection ) etl.run_degree_centrality() <- This ticket ( Just run it! It doesn't matter if run have already done it, buy yes not erasing an existing property default I agree ) etl.drop_database('arrows') <- It does work but does not make sens for a consecutive testing work flow etl.create_database_from_projection() <- Another ticket I submitted ( Just create it and replace if there is any existing database) etl.map_ids_to_names() <- Another ticket i submetted ( And mostly approved by the community )

The core idea of all these, it's a bit like the OR REPLACE of cypher, it's to make GDS in general more idempotent. A parameter like replace = False by default for data safety would be THE short and sweet solution.

So I can intuitily and quickly have something like this:

etl = ETL() etl.create_markov_chain_nodes() etl.create_markov_chain_relationships() etl.project_markov_chain_graph() etl.run_degree_centrality()

Much more readable and straightforward :)

adamnsch commented 5 months ago

Thank you for the explanation. So I gather that you are not at least blocked. That's good.

We will consider adding replace/overwrite functionality to our algorithms.

In the meantime, you could perhaps wrap the "drop" calls together with their corresponding create/mutate calls in new methods to achieve the workflow you desire?

Thank you, Adam