neo4j-graph-analytics / ml-models

Machine Learning Procedures and Functions for Neo4j
https://github.com/neo4j-graph-analytics/ml-models/releases/tag/1.0.0
Apache License 2.0
64 stars 23 forks source link

length of resulting embedding #7

Open tomasonjo opened 5 years ago

tomasonjo commented 5 years ago

Using same data as in https://github.com/neo4j-graph-analytics/ml-models/issues/6 data: https://snap.stanford.edu/data/p2p-Gnutella09.html

import:

LOAD CSV FROM "file:///p2p-Gnutella09.txt" as row fieldterminator ' ' WITH row SKIP 4 MERGE (h1:Host{id:row[0]}) MERGE (h2:Host{id:row[1]}) MERGE (h1)-[:CONNECTION]->(h2)

Run algo with 2 iterations and pagerank node property:

CALL embedding.deepgl("Host","CONNECTION", {
nodeFeatures: ['pagerank'],
iterations: 2
})

Length of the resulting array for each node is 451 according to:

MATCH (n:Host)
WITH n limit 1
RETURN length(n.embedding)

That seems quite a lot with only 2 iterations and thus calculating cosine similarity for 8k nodes can take 5+ minutes on a 16gb and the whole process on a big graph would be quite slow for now.