jexp / neo4j-shell-tools

A bunch of import/export tools for the neo4j-shell
288 stars 55 forks source link

Local import/export file when using a remote host #118

Open john-bodley opened 7 years ago

john-bodley commented 7 years ago

@jexp I love how performant the binary export/import is when compared to using neo4j-shell -c dump or neo4j-shell -file for exporting and import respectively.

The issue I'm running into is I would like to import a local file to a remote host in a similar vein to the mysqldump and mysql commands. The neo4j-shell supports remote hosts and bash supports reading from STDIN so I can definitely do,

> neo4j-shell -host <remote> -file < /tmp/graph.cql

where /tmp/graph.cql is local, however when using import-binary the input filename is embedded in the command, so in a local context the following works,

 > neo4j-shell -host localhost
 NOTE: Remote Neo4j graph database service 'shell' at port 1337
 neo4j-sh (?)$ import-binary -i /tmp/graph.bin

however this clearly fails as the /tmp/graph.bin doesn't exist on the remote machine.

 > neo4j-shell -host <remote>
 NOTE: Remote Neo4j graph database service 'shell' at port 1337
 neo4j-sh (?)$ import-binary -i /tmp/graph.bin
 /tmp/graph.bin (No such file or directory)

Do you know if there's any workaround for this, i.e. is it possible to import data from a local machine to a remote host running Neo4j without explicitly copying the file?

jexp commented 7 years ago

@johnbodley good feedback, currently it is only for local files. But with the APOC procedures you can also provide remote URLs for imports.

Currently there is no means for this but the bolt protocol just got support for byte-arrays, so we could look into providing Cypher parameters as files in cypher-shell that are then sent as binary content over the wire with a statement. Something worth considering.

john-bodley commented 7 years ago

Thanks @jexp for the response. I also came across these two posts written by you which have been quite helpful:

it seems like we can import the data to a remote host via cypher-shell -a bolt://<host>:<port> < /tmp/graph.db.cql which can be installed locally without Neo4j (which is great). I was concerned about the performance as originally when using neo4j-shell the dump took about 45x the amount of time as your binary export,

> time neo4j-shell -c "export-binary -o /tmp/graph.db.bin"
real    0m32.256s
> time neo4j-shell -c dump > /tmp/graph.db.cql
real 23m47.289s

Thankfully it seems like the APOC approach is way more performant and only about 3x the time of the binary export (which is acceptable)

> time cypher-shell 'CALL apoc.export.cypher.all("/tmp/graph.db.cql", {format: "cypher-shell"})'
real    1m29.785s

Sadly importing the data into an empty data is significantly faster (> 10x) in binary mode,

> time neo4j-shell -c "import-binary -c -i /tmp/graph.db.bin"
real 3m47.778s

compared to using either the cypher-shell or a Neo4j driver (Python). I presume this is because the statements are not parameterized.

sharath1608 commented 6 years ago

Has there been any progress on this one? I'm currently mounting my remote filesystem to make this work. Would love an out-of-box solution.