neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
637 stars 161 forks source link

mutateNodeProperties mutateRelationship #155

Open apappascs opened 2 years ago

apappascs commented 2 years ago

Current implementation let the user to store information to the in-memory graph, only when using the mutate mode of graph algorithms.

The following procedures will give this option to the user: gds.graph.mutateNodeProperties gds.graph.mutateRelationship

E.g. mutate in_degree and out_degree and need to mutate in_out_degree to the in-memory graph.

P.S. mutateRelationshipType accepts only new relationType and do not accept new property on existing relationType.

call gds.nodeSimilarity.mutate('countries', 
{similarityCutoff: 0.5, topK: 20,
mutateRelationshipType: 'SIMILARITY',
mutateProperty: 'node_similarity'})

If the above query run twice (or other algo mutate already rel type 'SIMILARITY') will throw exception

Neo.ClientError.Procedure.ProcedureCallFailed
Relationship type `SIMILARITY` already exists in the in-memory graph

While if we run the query with relationshipWeightProperty:

call gds.nodeSimilarity.mutate('countries', 
{similarityCutoff: 0.5, topK: 20,
mutateRelationshipType: 'SIMILARITY',
mutateProperty: 'node_similarity',
relationshipWeightProperty: 'total'
})
Neo.ClientError.Procedure.ProcedureCallFailed
Relationship weight property `total` not found in relationship types ['SIMILARITY']. Properties existing on all relationship types: []
soerenreichardt commented 2 years ago

Hello @alexpappasc, thanks for the feature request. Am I understanding you correctly that you are looking for procedures to add new node properties and relationshipTypes to the in-memory graph? I wonder where the information, for example of the new node property, should come from if not from an algorithm. Your mentioned use-case, the in and out degree of a graph, can be achieved by using the degree centrality algorithm. Via the orientation parameter you can control whether to load the in-, out- or undirected degree. For your second example, it is correct that you cannot add a new property to an existing relationship type, which is due to the fact that relationship properties are tightly coupled to the topology of a relationship type induced graph. Changing this is a larger endevour which is currently not planned as part of our roadmap. If you could share your use-case for this feature, we could discuss internally if we want to consider this feature.

Cheers

apappascs commented 2 years ago

Thank you for the update @soerenreichardt . Thats correct, I am looking for procedures to add new node properties and relationshipTypes to the in-memory graph. Let me demonstrate an example use case.

1 step: calculate score from multiple centrality algorithms 2 step: mutate a user defined score which can take values from the in-memory graph, or any value from the cypher query 3 step: calculate similarity or run other algorithms using this user defined property.

in order to do that we have to:

  1. create in memory graph
  2. calculate the centrality score with many algorithms
  3. store to the graph
  4. calculate the user defined score
  5. store it to the graph
  6. create another in memory graph
  7. do similarity, ml etc.

If those procedures being exposed to the user then:

  1. create in memory graph
  2. calculate the centrality score with many algorithms
  3. calculate the user defined score
  4. do similarity, ml etc.

e.g. mutate a user defined score which can take values from the in-memory graph, or any value from the cypher query and store it to the in-memory graph: e.g. in_out_degree = in_degree + out_degree

 CALL gds.graph.streamNodeProperties('countries', ['pagerank','degree'])
    YIELD nodeId, nodeProperty, propertyValue
    WITH nodeId, sum(CASE
          WHEN (nodeProperty= "degree" OR nodeProperty= "degree_REVERSE") THEN propertyValue
          END) as in_out_degree
    CALL gds.graph.mutateNodeProperties(nodeId,'in_out_degree',in_out_degree)
    RETURN nodeId,in_out_degree;

this wont be efficient for many nodes but its just an example.

The feature request is to expose the functionality which already exists in the algorithms, where mutate node properties and relationships is already possible.

Regarding the second example, the error message should be fixed then. relationshipWeightProperty is related to the property that already exists to the in memory graph and not to the mutateRelationshipType. Weighted and unWeight algorithms should have the same error message in this case right?

soerenreichardt commented 2 years ago

Hej @alexpappasc, thanks for clarifying your examples. I think that is a reasonable request which we will discuss and process internally. It might take a while though for us to get priority and implement this. I can keep you updated about the status here.