neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

run GDS in cypher projection that not able to run due to inability to set undirected relationships #257

Closed sybest1259 closed 1 year ago

sybest1259 commented 1 year ago

problem: I projecting the graph with gds.graph.project.cypher for a specific range subgraph, then I use GDS to run algorithms. I found that some algorithms need subgraph with undirected relationships(eg:Triangle Count), but gds.graph.project.cypher can not create projection with undirected relationships, how can I solve this problem.

Is there any configurations to set undirected relationships in gds.graph.project.cypher, or will this setting be available in the future?

Thks

knutwalker commented 1 year ago

Hi @sybest1259, we have no configuration available or planned for Cypher projection to load undirected graphs. However, you can rewrite your projection as a Cypher aggregation and that one allows you to specify relationship types which will be loaded as undirected and can be used in algorithms such as triangle count: https://neo4j.com/docs/graph-data-science/2.3/management-ops/projections/graph-project-cypher-aggregation/#graph-project-cypher-aggregation-syntax-configuration

sybest1259 commented 1 year ago

Thks @knutwalker, I use Cypher aggregation as follows:

match path=(c1:company{entName:"XXX"})-[r1]-(c2:company)-[r2]-() with [r in relationships(path) | [startNode(r), endNode(r), type(r)]] as rels UNWIND rels as rels1
WITH DISTINCT rels1[0] as source, rels1[1] as target, rels1[2] as relType
WITH gds.alpha.graph.project( 'g1', source, target, { sourceNodeLabels: labels(source), targetNodeLabels: labels(target) }, { relationshipType: relType , properties:{cost: 1} }, { undirectedRelationshipTypes: ['*'] }
) AS g RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels

Then I have two questiones:

  1. The number of the relationships in this projection is more than the actual quantity, is there any error in the above statement? It seems like undirectedRelationshipTypes: ['*'] doubles the number of edges.
  2. how can I set different value of property for different types of relationships, such as cost. I follow the documentation to set relationshipProperties in relationshipConfig,but I got: Failed to invoke function gds.alpha.graph.project: Caused by: java.lang.IllegalArgumentException: Unexpected configuration key: relationshipProperties
knutwalker commented 1 year ago

@sybest1259

  1. with undirectedRelationshipTypes those relationships are turned into undirected relationships by GDS, that is, we duplicate the relationship internally and store it on both nodes. Since your query is already matching with an undirected pattern, you get another doubling from the Cypher query. You would need to use a pattern like (c1:company{entName:"XXX"})->[r1]<-(c2:company)->[r2]<-() (just guessing on the arrow heads) . The documentation is a lacking in that regard, we already have it tracked to improve it and cover this peculiarity
  2. Your exception says that you used relationshipProperties but it's only properties, like you have in the example query.
sybest1259 commented 1 year ago

This attribute is declared in the documentation,do you mean that there is no such setting and I can not set different value of property for different types of relationships? 截屏2023-03-23 17 51 57

knutwalker commented 1 year ago

Oh, that is a bug in the documentation, sorry about that. You can set the property with the properties key in the fifth parameter, which is called relationshipConfig. It's described further down in the documentation: https://neo4j.com/docs/graph-data-science/2.3/management-ops/projections/graph-project-cypher-aggregation/#cypher-aggregation-relationship-properties

You can provide different properties per type with a little bit of Cypher, for example:

WITH {"type1": "prop1", "type2": "prop2"} AS typeToProperty
MATCH …
WITH gds.alpha.graph.project(
    "g"
    source,
    target,
    { … },
    {
        properties: { typeToProperty[type(r)]: r[typeToProperty[type(r)]] }
    },
    { … }
) AS g
sybest1259 commented 1 year ago

Thks @knutwalker, I tried the way you said, but there was an error: 截屏2023-03-29 15 02 53 Is it a syntax error?

I actually want to set a default value for the cost attribute of the LIKE relationship in cypher aggregation projection Such like the conf in native projection: LIKE:{orientation:'NATURAL',properties:{cost:{property:'cost',defaultValue:1.0}}} What shall I do?

FlorentinD commented 1 year ago

Hello @sybest1259 , your first error is a syntax error. Keys in cypher maps are always strings, so WITH {LIKE: "cost"} should work there. To achieve default values, you can use the coalesce function (https://neo4j.com/docs/cypher-manual/5/functions/scalar/#functions-coalesce). Looking at Pauls suggestion, I would then suggest properties: {typeToProperty[type(r)]: coalesce(r[typeToProperty[type(r)]], 1.0) }. Via a map you could also specify a default value property.

I can also recommend you the cypher cheat sheet (https://neo4j.com/docs/cypher-cheat-sheet/current/).

sybest1259 commented 1 year ago

Thks @FlorentinD, I use Cypher aggregation as follows:

match path=(p1:person{num:"12"})-[r1]-()--(p2:person)--()--() with distinct [r in relationships(path) |[startNode(r), endNode(r), type(r)]] as rels,[r in relationships(path) |r] as rr UNWIND rels as rels1
UNWIND rr as rr1 with distinct rels1[0] as source, rels1[1] as target, rels1[2] as relType,{LIKE: "cost"} AS typeToProperty,rr1
WITH gds.alpha.graph.project( 'g1', source, target, { sourceNodeLabels: labels(source), targetNodeLabels: labels(target) }, { relationshipType: relType, properties: {LIKE: coalesce(rr1[typeToProperty[type(rr1)]], 1.0) } }, { undirectedRelationshipTypes: ['*']
}
) AS g RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels

Then I got the result below: graph | nodes | rels "g1" | 1906 | 74718

But I found that property cost was not set successfully. I ran cypher like this:

CALL gds.graph.relationshipProperty.stream('g1', 'cost') YIELD sourceNodeId, targetNodeId, propertyValue AS cost RETURN cost

Error: Failed to invoke procedure gds.graph.relationshipProperty.stream: Caused by: java.lang.IllegalArgumentException: Expecting at least one relationship projection to contain property key(s) ['cost'].

sybest1259 commented 1 year ago

properties: {LIKE: coalesce(rr1[typeToProperty[type(rr1)]], 1.0) } actually create LIKE property per type of relationships, not create cost property for LIKE relationship

I want to set the cost property with different default values for different types of relationships in the projection, e.g. relationship LIKE and UNLIKE, set the default value of their cost property to 1.0 and 0.5 respectively.

FlorentinD commented 1 year ago

I see. What you are looking for is apoc.map.fromValues(values [Any]) (https://neo4j.com/docs/apoc/5/overview/apoc.map/apoc.map.fromValues/) which you can than use together with [typeToProperty[type(rr1)], coalesce(rr1[typeToProperty[type(rr1)]], 1.0)]

sybest1259 commented 1 year ago

Thks @FlorentinD, Could you give me a complete example?

FlorentinD commented 1 year ago

adjusting your example - I would try the following:

MATCH path=(p1:person{num:"12"})-[r1]-()--(p2:person)--()--()
WITH distinct [r in relationships(path) |[startNode(r), endNode(r), type(r)]] as rels,[r in relationships(path) |r] as rr
UNWIND rels as rels1
UNWIND rr as rr1
WITH distinct rels1[0] as source, rels1[1] as target, rels1[2] as relType,{LIKE: "cost"} AS typeToProperty,rr1
WITH gds.alpha.graph.project(
'g1',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target)
},
{
relationshipType: relType,
properties: apoc.map.fromValues(typeToProperty[type(rr1)] ,coalesce(rr1[typeToProperty[type(rr1)]], 1.0))
},
{
undirectedRelationshipTypes: ['*']
}
) AS g
RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels
sybest1259 commented 1 year ago

Thks @FlorentinD, I had tried the following:

match path=(p1:person{num:"12"})-[r1]-()--(p2:person)--()--()
with [r in relationships(path) |[startNode(r), endNode(r), r]] as rels
UNWIND rels as rels1   
with distinct rels1 as rels2                                                       
with rels2[0] as source, rels2[1] as target, type(rels2[2]) as relType,{LIKE: 1.0} AS typeToProperty,rels2[2] as rr1                   WITH gds.alpha.graph.project(
  'g1',
  source,
  target,
  {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target)
  },
  {
    relationshipType: relType,
    properties: rr1{cost:typeToProperty[type(rr1)]}
  },
  {
    undirectedRelationshipTypes: ['*']                                                   
  }                         
) AS g
RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels

It seems like it's working.