mirkonasato / graphipedia

Creates a Neo4j graph of Wikipedia links.
253 stars 63 forks source link

Upgrading to latest Neo4j? #13

Open greenguy33 opened 3 years ago

greenguy33 commented 3 years ago

I'm interested in trying out this library, super cool that you put this together! However Neo4j 3.2.9 is hard to come by at this point (doesn't seem like it's distributed on the official page anymore). Would be nice to be able to use with the latest version (looks like 4.2.5).

This is something I might be able to work on and submit a pull request for, but first wanted to get some sense of how much work it would be. Based on your last commit it looks like you've done this in the past from a previous version to 3.2.9; any sense of what the changes are that would have to be made?

On a side note, this isn't the first time that Neo4j's lack of backwards compatibility has been a thorn in my side.

mirkonasato commented 3 years ago

Well, to tell you what's involved in upgrading to the latest version I would need to actually do the upgrade. 🙂

I suggest looking at the forks and see if somebody already did it. Like, this one by noppaz seems to be using Neo4j v4.

greenguy33 commented 3 years ago

Thanks for the suggestion to look at the forks. I was able to use andrew-yarmola's fork to generate what I needed!

BlueGreenMagick commented 3 years ago

As a warning to others, databases created with noppaz's fork seems to be slightly off. Everything else seems to work fine, but using a SET query to create a new property with a floating value has a serious side effect of changing other node's property as well.

For example, if you run MATCH (n) WHERE id(n) = 1001 SET n.newProp=0.123 query to create a property, the first node(with id 0) gets overridden with 1001st node. Running algorithms that write to a new property like gds.pagerank.write is affected as well, sometimes resulting in error messages such as IllegalStateException: 13 already exists.

As a workaround, I've had to use export and import back the database with apoc.export.cypher.all.

greenguy33 commented 3 years ago

@BlueGreenMagick I experienced the same thing with andrew-yarmola fork. I also was trying to run gds.pagerank and ran into this issue that caused my database to be corrupted; I had to restore from the backup. Could you elaborate more on your workaround? I tried several things but couldn't find anything that wouldn't involve eventually writing properties using SET

BlueGreenMagick commented 3 years ago

Using the apoc plugin, I exported the entire database with apoc.export.cypher.all, then imported it in a new database by running the cypher file. (Following the steps in this blog) The downside is that it takes a huge amount of time.

I also tried exporting&importing as a dump file (which is much faster), but that doesn't seem to work.

greenguy33 commented 3 years ago

Ah ok, gotcha. In that case I'm assuming that you ran gds.pagerank with mutate mode to modify the in-memory graph, then dumped the in-memory graph to a file and re-imported it to a new database?

I tried to do something like that as well but I ran into trouble with string properties. Neo4j won't allow string properties to be imported to an in-memory graph (as documented here), and I need string properties to match up nodes in the original graph to nodes in the new graph, since the nodeIds aren't guaranteed to stay the same between the two graphs.

Still haven't found a working solution yet.

BlueGreenMagick commented 3 years ago

I exported and imported the entire database in cypher file. The newly imported database doesn't have the SET problem, so gds.pagerank.write worked fine on the new database.

greenguy33 commented 3 years ago

That makes sense, thanks for the tip!