Closed bgottfried91 closed 10 years ago
can you post your equivalent of the sample/import.sh file ?
Wasn't sure if you meant the code in the actual import.sh file used to run the batch import (I'm not using Maven to build it) or if you meant the args provided in the command.
java -classpath $CP -Xmx$HEAP -Xms$HEAP -Dfile.encoding=UTF-8 org.neo4j.batchimport.Importer batch.properties "$DB" "$NODES" "$RELS" "$@"
The command to run the import is: bash import.sh nonIndexed.db uids.csv,pmids.csv rels.csv,synonyms.csv
The csv files range from several MB to multiple GB in size, so I can find somewhere to put them up, but they're gigantic...
Could it be that you used a different field separator? The default is tab, you can configure a comma though?
Perhaps try it with a single smaller file first to find the issue? Perhaps you can share a smalls sample file that makes this issue reproducible
Btw can you share your batch.properties. You don't need that much storage for strings and properties, 1-2GB each should be fine try to give the relationship-store the most memory.
I'll include the first 10 lines of each of the CSVs, as well as the batch.properties. I'll cut out a small portion of each of the files and put up a link to them: head uids.csv uid name D000001 Calcimycin D000001-1 A-23187 D000001-2 A 23187 D000001-3 Antibiotic A23187 D000001-4 A23187, Antibiotic D000001-5 A23187 D000002 Temefos D000002-1 Temephos D000002-2 Abate
head pmids.csv pmid 12255683 12334433 20255877 12255369 12255508 12305503 12233291 12259097 12334491
head rels.csv pmid uid type 218986 94827 Mentions 218987 35807 Mentions 218987 44082 Mentions 218987 44093 Mentions 218987 57667 Mentions 218987 75228 Mentions 218987 75242 Mentions 218987 83565 Mentions 218987 106937 Mentions
head synonyms.csv uid synonym type 0 0 MENTIONS 0 1 MENTIONS 0 1 MENTIONS 0 1 MENTIONS 0 1 MENTIONS 0 1 MENTIONS 6 6 MENTIONS 6 7 MENTIONS 6 7 MENTIONS
cat batch.properties use_memory_mapped_buffers=true neostore.nodestore.db.mapped_memory=10G neostore.relationshipstore.db.mapped_memory=10G neostore.propertystore.db.mapped_memory=12G neostore.propertystore.db.strings.mapped_memory=12G neostore.propertystore.db.arrays.mapped_memory=0M neostore.propertystore.db.index.keys.mapped_memory=15M neostore.propertystore.db.index.mapped_memory=15M
Can you please share them as zip-file otherwise line-endings and delimiters are mangled?
Gzipped each of the four input files and put them into this folder on drive. Let me know if they're broken in some way: https://drive.google.com/folderview?id=0Bx98DkxmHnEtWE5BRzlfM2lqYTQ&usp=sharing
A little context from my testing of this sample: when I constructed the database from this and started the server with it, the properties were there and I could query for specific nodes. The only thing that was changed in these files from the original files was that the two relation files were heavily truncated. Does this mean I need to be allocating more memory for the relations?
Final Update: Whatever issue that was happening, I can't seem to reproduce the issue now using the full files. As such, I'll be closing the issue, though if anyone has any idea why it might have occurred originally, I'd love to hear about it.
Using the 2.0 branch of the batch importer, I'm able to import ~11 million nodes and ~94 million relationships, but apparently only 1 property is being imported:
Currently the batch importer properties allocate 10GB to node storage and relationship storage each and 12GB each to property and long-string storage; I'm working on a system with 48GB of RAM, so those are pretty much the upper limits of the system.
Attempting to query for any node results in this error:
Any suggestions on what I need to change about my import process to fix the issue?