neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

gds.graph.list returns error #240

Closed tomasonjo closed 1 year ago

tomasonjo commented 1 year ago

GDS version: 2.2.6 Neo4j version: 5.3.0 Operating system: Ubuntu 20.04

There are probably some redundant steps in the reproduction, however, this is exact steps from my book that returns an error if you want to list GDS graphs after the projection.

Import graph

CREATE CONSTRAINT IF NOT EXISTS FOR (u:User) REQUIRE u.id IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (p:Tweet) REQUIRE p.id IS UNIQUE;
LOAD CSV WITH HEADERS FROM "https://bit.ly/39JYakC" AS row
MERGE (u:User {id:row.id})
ON CREATE SET u.name = row.name,
              u.username = row.username,
              u.registeredAt = datetime(row.createdAt);
LOAD CSV WITH HEADERS FROM "https://bit.ly/3n08lEL" AS row
CALL {
    WITH row
    MATCH (s:User {id:row.source})
    MATCH (t:User {id:row.target})
    MERGE (s)-[:FOLLOWS]->(t)
} IN TRANSACTIONS;
LOAD CSV WITH HEADERS FROM "https://bit.ly/3y3ODyc" AS row
CALL {
    WITH row
    MATCH (a:User{id:row.author})
    MERGE (p:Tweet{id:row.id})
    ON CREATE SET p.text = row.text,
                p.createdAt = datetime(row.createdAt)
    MERGE (a)-[:PUBLISH]->(p)
} IN TRANSACTIONS;
LOAD CSV WITH HEADERS FROM "https://bit.ly/3QyDrRl" AS row
MATCH (source:Tweet {id:row.source})
MATCH (target:Tweet {id:row.target})
MERGE (source)-[:RETWEETS]->(target);

Calculate features

MATCH (u:User)
OPTIONAL MATCH (u)-[:PUBLISH]->(tweet)
WHERE NOT EXISTS { (tweet)-[:RETWEETS]->() }
WITH u, count(tweet) AS tweetCount
OPTIONAL MATCH (u)-[:PUBLISH]->(retweet)
WHERE EXISTS { (retweet)-[:RETWEETS]->() }
WITH u, tweetCount, count(retweet) AS retweetCount
WITH u, tweetCount,
  CASE WHEN tweetCount + retweetCount = 0 THEN 0
    ELSE toFloat(retweetCount) / (tweetCount + retweetCount)
      END AS retweetRatio
SET u.tweetCount = tweetCount,
    u.retweetRatio = retweetRatio;
MATCH (u:User)
OPTIONAL MATCH (u)-[:PUBLISH]-(retweet)-[:RETWEETS]->(tweet)
WITH u, toInteger(duration.between(
  tweet.createdAt, retweet.createdAt).minutes) AS retweetDelay
WITH u, avg(retweetDelay) AS averageRetweetDelay
SET u.timeToRetweet = coalesce(averageRetweetDelay, 372);
MATCH (u:User)
WITH u,
     count{ (u)<-[:FOLLOWS]-() } AS inDegree,
     count{ (u)-[:FOLLOWS]->() } AS outDegree,
     count{ (u)-[:FOLLOWS]->()-[:FOLLOWS]->(u) } AS friendCount
SET u.inDegree = inDegree,
    u.outDegree = outDegree,
    u.friendCount = friendCount;
MATCH (u:User)
OPTIONAL MATCH p=(u)-[:FOLLOWS]->()-[:FOLLOWS]->()-[:FOLLOWS]->(u)
WITH u, count(p) AS graphlet5
SET u.graphlet5 = graphlet5;
MATCH (u:User)
OPTIONAL MATCH p=(u)-[:FOLLOWS]->()-[:FOLLOWS]->()<-[:FOLLOWS]-(u)
WITH u, count(p) AS graphlet8
SET u.graphlet8 = graphlet8;
MATCH (u:User)
OPTIONAL MATCH (u)-[:FOLLOWS]->(other1)-[:FOLLOWS]->(other2)-[:FOLLOWS]->(u),
               (u)<-[:FOLLOWS]-(other1)<-[:FOLLOWS]-(other2)<-[:FOLLOWS]-(u)
WHERE id(other1) < id(other2)
WITH u, count(other1) AS graphlet11
SET u.graphlet11 = graphlet11;

Project graph

CALL gds.graph.project('knnExample','User', 'FOLLOWS',
 {nodeProperties:['tweetCount', 'retweetRatio', 'timeToRetweet', 'inDegree',
  'outDegree', 'friendCount', 'graphlet5', 'graphlet8', 'graphlet11']});

and then finally list graphs

CALL gds.graph.list(); produces an error:

Failed to invoke procedure gds.graph.list: Caused by: java.lang.UnsupportedOperationException: can't get field offset on a hidden class: private final org.neo4j.io.pagecache.impl.muninn.MuninnPagedFile org.neo4j.io.pagecache.impl.muninn.MuninnPagedFile$$Lambda$1711/0x0000000801933bb0.arg$1

DarthMax commented 1 year ago

Hej @tomasonjo could you tell me which JVM you are running? We have observed similar issues with the Zulu JVM

tomasonjo commented 1 year ago

This is ran on Neo4j Desktop. If I understand correctly from the docs, the Desktop includes a bundled Java runtime

Neo4j Desktop comes with a free Developer License of Neo4j Enterprise Edition. The Java Runtime is also bundled.

Anyway, the gds.debug.sysInfo returns:

key value
"buildJdk" "11.0.15+10 (Eclipse Adoptium)"
"buildJavaVersion" "11.0.15"
"vmName" "OpenJDK 64-Bit Server VM"
"vmVersion" "17.0.5+8-LTS"
"vmCompiler" "HotSpot 64-Bit Tiered Compilers"
DarthMax commented 1 year ago

It does look like the same issue, It should be fixed in the coming version of 2.2.7 or the 2.3 release. Until then you could try to use graph.list and yield only the fields that you need, specifically do not yield the memoryUsage field