neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
771 stars 194 forks source link

Huge Louvain - Locks #558

Open mneedham opened 6 years ago

mneedham commented 6 years ago
for the last 6 hours on the 18bn rel graph we still are at 0%

2018-02-03 04:06:50.380+0000 INFO [o.n.k.i.p.Procedures] [algo-82] Louvain 0%
2018-02-03 04:07:00.382+0000 INFO [o.n.k.i.p.Procedures] [algo-80] Louvain 0%```

Seem to be some locked threads:

        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at org.neo4j.graphalgo.impl.louvain.HugeParallelLouvain$Task.run(HugeParallelLouvain.java:247)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
bsheldon commented 6 years ago

To chime in here with a similar experience...I am currently working off a Google cloud compute instance with a fairly hefty machine running our graph (v3.3.4) and using latest graph algo release (3.3.2.0.jar).

After running louvain over 400k nodes and 169M edges between them, its still stuck at 0% and appears stagnant after 24 hours processing. Here's my cypher call below for reference:

CALL algo.louvain('Topic', 'WITH',
         {write:true, writeProperty:'community'})
       YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;
mneedham commented 6 years ago

@bsheldon this should be fixed in the current development version. If you compile it locally you can just replace the Jar with the one that gets generated in algo/target/.

I can try and do that for you later and host it somewhere if that doesn't work

bsheldon commented 6 years ago

Thanks for the heads up @mneedham - I will give that a try and report back.

bsheldon commented 6 years ago

@mneedham just to clarify, which branch is the current dev version?

mneedham commented 6 years ago

All of them are up to date so you can build it for whichever Neo4j version you're using.

On Wed, Apr 25, 2018, 17:53 Blaine Sheldon notifications@github.com wrote:

@mneedham https://github.com/mneedham just to clarify, which branch is the current dev version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neo4j-contrib/neo4j-graph-algorithms/issues/558#issuecomment-384357536, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAzpBVtXsgVuDOqGxvsWF_SMwewBeMoks5tsKoCgaJpZM4R5H5r .

bsheldon commented 6 years ago

Good deal - looks like the build fails right now on 3.3 branch (which is what i'm using):

[INFO] JVM J0:     0.93 ..   115.42 =   114.49s
[INFO] Execution time total: 1 minute 55 seconds
[INFO] Tests summary: 94 suites (3 ignored), 576 tests, 3 errors, 14 ignored (3 assumptions)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Neo4j Graph Algorithms :: Core 3.3.2.0 ............. SUCCESS [ 27.395 s]
[INFO] Neo4j Graph Algorithms :: Algo 3.3.2.0 ............. SUCCESS [ 17.463 s]
[INFO] Neo4j Graph Algorithms :: Tests 3.3.2.0 ............ FAILURE [02:01 min]
[INFO] Neo4j Graph Algorithms :: Docs 3.3.2.0 ............. SKIPPED
[INFO] Neo4j Graph Algorithms 3.3.4.0 ..................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:46 min
[INFO] Finished at: 2018-04-25T10:10:39-07:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.carrotsearch.randomizedtesting:junit4-maven-plugin:2.5.0:junit4 (unit-tests) on project graph-algorithms-tests: There were test failures: 94 suites (3 ignored), 576 tests, 3 errors, 14 ignored (3 assumptions) [seed: 3F0961202E99F9E0] -> [Help 1]

I was able to build 3.2 branch though...

bsheldon commented 6 years ago

Looks like it did build the requisite graph-algorithms-algo-3.3.2.0.jar however, so assuming that even though tests are not passing, it will run...

bsheldon commented 6 years ago

Sorry to pound this thread, but wanted to confirm that although the tests fail on 3.3 build for me, I was able to use the latest jar. I am getting a separate output now where it is actually failing pretty reliably:

Failed to invoke procedure `algo.louvain`: Caused by: java.lang.ArrayIndexOutOfBoundsException: 2856490

log output:

Apr 25 17:39:17 topic-graph pre-neo4j.sh[31997]: 2018-04-25 17:39:17.371+0000 INFO  [neo4j.Pooled-2] LOADING 0%
Apr 25 17:39:28 topic-graph pre-neo4j.sh[31997]: 2018-04-25 17:39:28.339+0000 INFO  [HugeRelationshipImport-5] LOADING 75%
Apr 25 17:39:38 topic-graph pre-neo4j.sh[31997]: 2018-04-25 17:39:38.511+0000 INFO  [HugeRelationshipImport-4] LOADING 95%
Apr 25 17:39:40 topic-graph pre-neo4j.sh[31997]: 2018-04-25 17:39:40.097+0000 INFO  [neo4j.Pooled-2] LOADING 100%
Apr 25 17:39:40 topic-graph pre-neo4j.sh[31997]: 2018-04-25 17:39:40.125+0000 INFO  [algo-4] Louvain 0%
mneedham commented 6 years ago

Is there a full stack trace for that in one of the log files?

On Wed, Apr 25, 2018, 18:45 Blaine Sheldon notifications@github.com wrote:

Sorry to pound this thread, but wanted to confirm that although the tests fail on 3.3 build for me, I was able to use the latest jar. I am getting a separate output now where it is actually failing pretty reliably:

Failed to invoke procedure algo.louvain: Caused by: java.lang.ArrayIndexOutOfBoundsException: 2856490

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neo4j-contrib/neo4j-graph-algorithms/issues/558#issuecomment-384373918, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAzpP93AXxbGcVo3NcHnRPNR3LHi2l1ks5tsLYggaJpZM4R5H5r .

mknblch commented 6 years ago

Hi thanks for your feedback. We might have an error in the algo impl. but we have 3 of them ;) So you could try to load a different one by explicitly stating {graph:"huge"} OR {graph:"huge", weightProperty:"notARealPropertyName", defaultWeight:1.0}. Both statements cause the loader to load a different implementation and both behave like unweighted-louvain. Im curious if the problem persists.

bsheldon commented 6 years ago

Thanks for the pointers @mknblch - after some trials here, was able to get a full run by setting the last param config as such:

CALL algo.louvain('Topic', 'WITH',
  {write:true, writeProperty:'community', weightProperty:'weight', defaultValue:1.0 })
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;

Setting the graph:"huge" param the first go around still led to an error output as seen above.

Final results:

+---------------------------------------------------------------------------------+
| nodes  | communityCount | iterations | loadMillis | computeMillis | writeMillis |
+---------------------------------------------------------------------------------+
| 378897 | 189134         | 5          | 22574      | 1227500       | 7009        |
+---------------------------------------------------------------------------------+

1 row available after 1257101 ms, consumed after another 1 ms
voronaam commented 6 years ago

I am also seeing this problem and would be willing to help pinpointing this. I am running Neo4j embedded, a DB has 120 nodes (all type User) with some of them connected with a relation "TALKS" (1037 edges total). Not a big graph.

The query is

CALL algo.louvain.stream('User','TALKS', {concurrency:4}) YIELD nodeId, community RETURN nodeId, community order by community

It stalls. I left it running for 66h+ and it did not complete. The versions:

    compile 'org.neo4j:neo4j:3.3.5'
    compile 'org.neo4j:graph-algorithms-algo:3.2.9.0'

I will try to debug it, once I get a chance and will update this ticket.

mknblch commented 6 years ago

Hi. We have several different implementations of this algorithm and it seems the parallel impl. is still buggy. The Louvain procedure is a bit messy, it hides the actual implementation name in the log. It is also unclear how to trigger the use of an impl. Im going to adress this in the next update im working on. For now you could try to trigger weightedLouvain by using {graph:"huge", weightProperty:"foo", defaultValue:1.0} (this tells the algo to use a property named foo which must not be present. Then every relation gets a constant weight). Or just deactivate parallelism by using {concurrency:1}. Please notice that only phase one of louvain is implemented. Phase 2 (rebuild the graph based on the community structure and trigger louvain again) is left to be done by the user using cypher. But this will also be available as configuration option with the next update.

voronaam commented 6 years ago

Thank you, this one completes in almost no time.

loooo139 commented 6 years ago

@mknblch hi,i am also seeing this problem。my db has 4 millions nodes and 68millions connections.the node is named by id ,the connections is named Links. i run the queery CALL algo.louvain('id', 'Links', {graph:"huge", weightProperty:"foo", defaultValue:1.0}) YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis after 10 seconds it appears a error 'Neo.ClientError.Procedure.ProcedureCallFailed Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure algo.louvain: Caused by: java.lang.ArrayIndexOutOfBoundsException: -1' my neo4j version is 3.4.4.the algorithm is graph-algorithms-algo-3.4.4.0.jar.

PatrickSaad commented 5 years ago

Using graph-algorithms-algo-3.4.0.0.jar, for ~30k nodes, the following cypher worked:

CALL algo.louvain("Topic", "WITH",
    {write:true, writeProperty: 'community', concurrency: 1})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis

Adding concurrency: 1 fixed it, otherwise was just stuck at 0%.