Saving large graphs with Spring Neo4j

spring-projects / spring-data-neo4j

Provide support to increase developer productivity in Java when using Neo4j. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.

http://spring.io/projects/spring-data-neo4j

Apache License 2.0

825 stars 619 forks source link

Saving large graphs with Spring Neo4j #2587

Open mrksph opened 2 years ago

mrksph commented 2 years ago

Hi all,

I'm encountering some problems while trying to save a relatively big graph using Spring Data Neo4j .save() method passing the aggregate root. In the following image, you can see an example (the graph in the image is not complete, it's a little larger than that)

Is there any other way to speed up the save?

I tried to save first the nodes at depth 1 or depth 2 using concurrency but I think it won't work.

Thanks

meistermeier commented 2 years ago

What do you want to update? If something "near" the root entity, projections might help to make SDN just ignore the deeper related notes. Also they are a good fit if you want to go deep into a specific branch of the graph without looking left or right. Different projections can be used for different use-cases. If you want to update a node somewhere in the middle or a leaf node, I would suggest to just save this (e.g. with Neo4jTemplate if no repository is needed) instead of being 100% DDD accurate and save through the aggregate root.

mrksph commented 2 years ago

I want to save a graph which has many levels and many children at each level, just like the one in the image included in the OP. Like the following example but with many more children at each level, each can have many children also.

Client A - aggregated root

Client B
- Client BA
  - Client BAA
- Client BB
  - Client BBA
  - Client BBB
    - Client BBBB
Client C
- Client CA
Client DB
- Client DBA
- Client DBB
  - Client DBBA

It is very slow when I try to save the object like this: clientRepository.save(clientA)

meistermeier commented 2 years ago

There is a 6.3.3-SNAPSHOT available that should improve relationship performance. Maybe you could give it a try. I am happy to hear your feedback. Related issue: #2593

mrksph commented 2 years ago

Hi Gerrit thanks for your help! Any chance we getting this in 6.1.x? Because of compatibility issues we can't upgrade to SDN 6.3.x yet

meistermeier commented 2 years ago

Unfortunately this version won't get any updates Pascal Release train mentions OSS Support until: May 2022 What are your problems regarding compatibility? SDN 6.3 should also work as a drop in replacement.

meistermeier commented 2 years ago

There is now a 6.1.13-REL-PERFORMANCE-SNAPSHOT in the making. You could give it a try, when it is released (assuming ~1 hour from now).

mrksph commented 2 years ago

Hi Gerrit, awesome news, thank you!

Regarding the compatibility issues I don't remember exactly why we are tied to 6.1.x , I guess it was related to the fact that our IT test were failing due to https://github.com/spring-projects/spring-data-neo4j/issues/2488 maybe?

I just tried to run our tests with SDN 6.3.2 and I'm getting

throw new IllegalStateException("The provided database selection provider differs from the Neo4jClient's one.");

meistermeier commented 2 years ago

That's just the fact that you already defined a database selection "somewhere" in your config and maybe in you tests you are using the Neo4jClient...in(database) syntax. Disclaimer: The SNAPSHOT release above is completely unsupported :D But would be good to hear from you if this improves the experience.

mrksph commented 2 years ago

Well, in regard to this issue I will try to take a look tomorrow.

What you're saying about me having Neo4jClient...in(database) I think it's not that. We only have simple tests for our controllers and services and a few for our Integration Tests so...

I see that release 6.3.2 includes the fix for the issue I mentioned earlier but neo4jClient.getDatabaseSelectionProvider() is still returning null when I run my (integration) tests

EDIT:

Maybe I should add that our failing IT tests set up an embedded server, which may cause the selection provider to return null.

I've just tried to specify a spring.data.neo4j.database = "neo4j" but when the Neo4jTemplate bean is instantiated, it checks the neo4jClient.getDatabaseSelectionProvider() which is null

mrksph commented 2 years ago

Okay, so as we are tied to Spring Boot 2.4.x because our dependency with Spring Cloud 2020 we can't upgrade to SDN 6.2.1+ (https://docs.spring.io/spring-data/neo4j/docs/6.2.0/reference/html/#dependencies.spring-framework) or 6.3.0+ (https://docs.spring.io/spring-data/neo4j/docs/6.3.0/reference/html/#dependencies.spring-framework) because both needs a newer Spring Core version, currently Spring Boot 2.4.13 pulls Spring Framework 5.3.13

Also, the neo4j-java-driver dependency was being pulled by the APOC plugin dependency which was the reason I was getting the Exception I mentioned previously. I had to exclude it in my pom.xml to avoid getting that Exception while running our Integration Tests.

jrsperry commented 1 year ago

@mrksph I have been saving large graphs similar to your example with neo4j ogm. Depending on the complexity of the graph I’ve seen it perform anywhere from 4-10 faster than sdn.

Here’s a link to an issue I have open with linked projects showing the performance difference. May be worth giving the ogm a shot if sdn is still too slow.

https://github.com/spring-projects/spring-data-neo4j/issues/2636