jexp / batch-import

generic csv file neo4j batch importer
https://neo4j.com/docs/operations-manual/current/tools/import/
385 stars 158 forks source link

allow use of separate files for TSV headers? #97

Open RichMorin opened 10 years ago

RichMorin commented 10 years ago

Let's say that I have created a pair of huge (eg, multi-gigabyte) TSV files. After importing them, I find that I need to edit the header lines to add indexing, etc.

I can't edit the files directly; they're far too large for any conventional text editor. So, I need to use Unix tools such as head(1), tail(1), and cat(1) to manipulate the files in and around the editing process. This is both annoying and time-consuming.

So, I'd like to have a way to use separate files for the TSV headers. That would allow me to edit the (tiny) header files, leaving the (huge) data files alone. Please consider adding a feature such as this.

rswarup82 commented 9 years ago

Hi Mike,

I did spend some time with neo4j import tool comes with 2.2.x version, which allow me to provide nodes/relationship csv files headers in separate file. I found this features is very useful because when we are trying to import billions of nodes/relationship into graph database it's quite obvious that file size will be big hence it is impossible to open file in any text editor. Hence having CSV file header in separate file is very useful. Moreover, sometimes nodes/relationshpi csv files are splitted into multiple files in that case copying header in each file is quite impossible.

Is there anyway we have can have this features available in batch importer tool in coming release? In order use 2.1.8 version in production we might have to use batch importer tool for bulk data import to neo4j.

Looking forward to hear back from you,

Thanks for your support. Swarup Rakshit