jexp / batch-import

generic csv file neo4j batch importer
https://neo4j.com/docs/operations-manual/current/tools/import/
385 stars 157 forks source link

2.0 with quotes false: If last thing on header line is a type then get IllegalArgumentException #74

Closed craigbrett17 closed 10 years ago

craigbrett17 commented 10 years ago

Most unfortunately, I'm not able to replicate this on the sample files, but I've got a small file (head of the big nodes file I'm actually importing with) on which it does work.

EntityID:int:entities   type:label  Title:string:entities   Reference:string:entities   Year    Weight:int
1   SomeLabel   Title1  AReference  1979
2   SomeLabel   Title2  AReference  1978
3   SomeLabel   Title3  AReference  1978
4   SomeLabel   Title4  AReference  1980
5   SomeLabel   Title5  AReference  1979
6   SomeLabel   Title6  AReference  1979
7   SomeLabel   Title7  AReference  1979
8   SomeLabel   Title8  AReference  1979
9   SomeLabel   Title9  AReference  

When I try and import this (with a mocked up rels file just to get things started) it fails with the following exception:

Total import time: 2 seconds
Exception in thread "main" java.lang.IllegalArgumentException: Unknown Type int
        at org.neo4j.batchimport.importer.Type.fromString(Type.java:172)
        at org.neo4j.batchimport.importer.AbstractLineData.createHeaders(AbstractLineData.java:46)
        at org.neo4j.batchimport.importer.ChunkerLineData.<init>(ChunkerLineData.java:19)
        at org.neo4j.batchimport.Importer.createLineData(Importer.java:174)
        at org.neo4j.batchimport.Importer.importNodes(Importer.java:93)
        at org.neo4j.batchimport.Importer.doImport(Importer.java:228)
        at org.neo4j.batchimport.Importer.main(Importer.java:83)

If I change int to string, it also fails, but stating that string is an unknown type. If I have no type at all at the end it works. It also works if I set batch_import.csv.quotes to true, but that setting freezes when trying to import my big dataset, so I set it to false.

It could be that I'm doing something wrong, but its eluding me if I am.

jexp commented 10 years ago

What kind of newlines are you using in your files? Could it be \r\n ?

craigbrett17 commented 10 years ago

Yes, they are. I tested your theory by changing my line endings to UNIX format and things work fine.

My CSV files are coming from SQL Server's BCP program. I believe I can do some kind of hocus pocus to make it use a format file to use UNIX line endings as opposed to standard Windows ones in the meantime. If the CSV reader could cope with Windows line endings that would be great.

jexp commented 10 years ago

It should work, but I have to re-check.