neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.71k stars 493 forks source link

Extracting random subgraph from large graph #683

Open srikanthBezawada opened 6 years ago

srikanthBezawada commented 6 years ago

Hi, I'm a Neo4j beginner. I have a large graph with millions of nodes and edges imported to Neo4J database. I would like to query for any small random subgraph (~100 nodes) and export it to csv file. Can I achieve this using APOC ? Any help is appreciated. Thanks in advance!

graphadvantage commented 6 years ago

You can select start nodes at random using

MATCH (m:Movie)
WHERE rand() < 0.10
MATCH  (m)<-[:ACTED_IN]-(p)
RETURN *

You can use the path expansion procedures in apoc if you need more control (depth, filters, etc)

As a final step you can use the apoc.export.csv to get the result into csv. You may need to flatten your result to get what you want.

A simple way to do this is to drop your query into apoc.export.csv.query

WITH "/PATH_TO_FILE/export.csv" AS csv
CALL apoc.export.csv.query('MATCH (m:Movie) WHERE rand() < 0.10 MATCH (m)<-[r:ACTED_IN]-(p) RETURN m,p,r',csv,{}) 
YIELD file, source, format, nodes, relationships, properties, time, rows
RETURN *

Be sure to add this to your neo4j.conf file

#Apoc Plugin Configurations
apoc.import.file.enabled=true
apoc.export.file.enabled=true
dbms.security.procedures.unrestricted=*
srikanthBezawada commented 6 years ago

@graphadvantage ,Thanks for your response. The example query is working on movie database. For my database, the same query runs for sometime and then browser hangs. I'm not sure if I'm doing the query right.

MATCH (m:Page)
WHERE rand() < 0.10
MATCH  (m)<-[:Link]-(p)
RETURN *

I'm using wikipedia database(pages as nodes and links between them as edges). Got it from here. Thanks for the caution about neo4j.conf!

graphadvantage commented 6 years ago

Start by counting the nodes for (m:Page)

MATCH (m:Page)
RETURN COUNT(m)

Is a 10% sample returning more rows than your memory allocation can handle? Remember always start small at first.

srikanthBezawada commented 6 years ago

Count of nodes for (m:Page) 13441542

I tried the query with 0.1, 0.05 and 0.01, the result has been the same.. Can be a memory issue..

jexp commented 6 years ago
MATCH (m:Page)
WHERE rand() < 0.10
WITH m LIMIT 100
MATCH  (m)<-[:Link]-(p)
RETURN *

100 nodes and their links.

If you run Neo4j Desktop you might want to increase your heap setting a bit.