jexp / batch-import

generic csv file neo4j batch importer
https://neo4j.com/docs/operations-manual/current/tools/import/
386 stars 158 forks source link

Skip relationships with missing nodes instead of failing #65

Open kylemarkwilliams opened 10 years ago

kylemarkwilliams commented 10 years ago

When either the "start" or "end" node is a relationship does not exist the import fails with:

[WARNING]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: NodeRecord[12972393] not in use
        at org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord(NodeStore.java:252)
        at org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord(NodeStore.java:125)
        at org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchInserterImpl.java:1190)
        at org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(BatchInserterImpl.java:750)
        at org.neo4j.batchimport.Importer.importRelationships(Importer.java:158)
        at org.neo4j.batchimport.Importer.doImport(Importer.java:236)
        at org.neo4j.batchimport.Importer.main(Importer.java:83)
        ... 6 more
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] An exception occured while executing the Java class. null

NodeRecord[12972393] not in use

Where 12972393 was the missing node ID.

Is it possible for a warning message to be printed and the relationship skipped instead of having the whole import fail? Even if this was not the default behavior, I think it would be a useful feature as a configuration option.

jexp commented 10 years ago

Actually it already does this for index lookups, I can add it for direct node id lookups too.

kylemarkwilliams commented 10 years ago

Yes, I think it would be a nice feature. Perhaps with a warning error being printed stating that the node is missing.

Thanks!

On Thu, Jan 16, 2014 at 4:45 AM, Michael Hunger notifications@github.comwrote:

Actually it already does this for index lookups, I can add it for direct node id lookups too.

— Reply to this email directly or view it on GitHubhttps://github.com/jexp/batch-import/issues/65#issuecomment-32454482 .

jexp commented 9 years ago

The new neo4j-import tool supports skipping and logging unmet relationships.

See http://neo4j.com/docs/stable/import-tool.html http://neo4j.com/docs/stable/import-tool.html

Am 17.06.2015 um 22:32 schrieb Raymond Plante notifications@github.com:

Did this happen? Would be great feature when dealing with millions of nodes/relationships

— Reply to this email directly or view it on GitHub https://github.com/jexp/batch-import/issues/65#issuecomment-112940758.

raymondjplante commented 9 years ago

@jexp Thanks. If you set --skip-bad-relationships it says they're logged up the the max indicated by --bad-tolerance. Do you know if this means the import will still continue, just no longer logging the bad ones it comes across?

ehx-v1 commented 8 years ago

ohh, this doesn't help with batch inserting... perhaps you could queue rels with missing nodes as, let's say, RelationshipPrecalculations, and check whether the nodes are still missing at the end of the import? that would make more sense than just throwing an error immediately I think