opencypher / morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Apache License 2.0
337 stars 62 forks source link

Avoid deadlock on write transaction #942

Open goshaQ opened 4 years ago

goshaQ commented 4 years ago

I've noticed that writing relationship table (here) containing dozens of millions of rows with large batches (default value in the configuration, 100000) generates a lot of warnings like this

00:26:41 WARN RetryLogic: Transaction failed and will be retried in 1166ms
org.neo4j.driver.exceptions.TransientException: LockClient[x] can't wait on
resource RWLock[NODE(x), hash=x] since => LockClient[x] <-[:HELD_BY]- 
RWLock[NODE(x), hash=x] <-[:WAITING_FOR]- LockClient[x] <-[:HELD_BY]-
RWLock[NODE(x), hash=x]

I wonder whether this can be minimized by re-partitioning of the Spark DF. Maybe it is not actually needed to write relationships in parallel because of a lot of retried transactions?

s1ck commented 4 years ago

What you see there is basically back-pressure from Neo4j not being able to handle as many concurrent relationship writes. One option would be to reduce concurrency, another would be to re-partition the relationship DF on source, target. However, depending on the degree distribution this could also lead to multiple partitions / threads that write relationships between the same nodes, but maybe less often than just random partitioning.