jexp / batch-import

generic csv file neo4j batch importer
https://neo4j.com/docs/operations-manual/current/tools/import/
385 stars 158 forks source link

Slow Import / 2G nodes file #113

Closed dav009 closed 9 years ago

dav009 commented 9 years ago

So I'm trying to import what doesn't seem to be a big file but the tool is getting stucked in the first step printing "....." for hours.

nodes.csv is 2.8G res.csv is 405M

And batch.properties is as follows:

batch_array_separator =\\|
dump_configuration=false
cache_type=none
use_memory_mapped_buffers=true
neostore.propertystore.db.index.keys.mapped_memory=5M
neostore.propertystore.db.index.mapped_memory=5M
neostore.nodestore.db.mapped_memory=5G
neostore.relationshipstore.db.mapped_memory=5G
neostore.propertystore.db.mapped_memory=7G
neostore.propertystore.db.strings.mapped_memory=7G

I have used both neo4j 1.9 and 2.2, I am running this on a machine with 30G, I also left it running for 3/4 hours.

After stopping it, I realized that the neo4j database was empty and only properties has been inserted. So I assume it is taking way too long on inserting properties ?

jexp commented 9 years ago

I don't know if this works: batch_array_separator =\\|

If you have 30G in total it probably died doing GC:

Configure your batch-importer to use 4-8G Heap (in the shell-script)

if you have 30G in total that leaves 16G for the memory mapping:

So: 2G for nodes, 8G for rels and 3G for props and 3G for strings

Not sure if you really have arrays in your data.

I recommend that you look into the import tool that comes out of the box with Neo4j 2.2 see: http://neo4j.com/docs/stable/import-tool.html

dav009 commented 9 years ago

gonna try that, thanks for the quick response, I will close this issue :+1: