neo4j-graph-analytics / ml-models

Machine Learning Procedures and Functions for Neo4j
https://github.com/neo4j-graph-analytics/ml-models/releases/tag/1.0.0
Apache License 2.0
64 stars 23 forks source link

Could not initialize class org.nd4j.linalg.factory.Nd4j #10

Open paltusplintus opened 5 years ago

paltusplintus commented 5 years ago

When copying the 3rd release .jar file to plugins folder and setting the the config: dbms.security.procedures.whitelist=regression., embedding. dbms.security.procedures.unrestricted=regression., embedding. (running on neo4j 3.5)

The procedures embedding.deepWalk and embedding.deepgl cannot be run in the neo4j browser due to the error: neo4j Could not initialize class org.nd4j.linalg.factory.Nd4j

Seems like the compiled file lacks some dependcies.

Sorry I cannot build myself as I am not Java programmer.

Anybody has same issues? Thx.

jameswweis commented 5 years ago

@paltusplintus I am having the same issue for both deepgl and deepWalk on Neo4j 3.5.6 with version 1.0.3 of ml-models via the precompiled JAR. The exact error message is:

Failed to invoke procedure `embedding.deepgl`: 
Caused by: java.lang.NoClassDefFoundError: 
Could not initialize class org.nd4j.linalg.factory.Nd4j

apoc.*, algo.*, embedding.* are all properly whitelisted, installed, and accessible.

@mneedham @jexp @meltzerpete Do we need to recompile from source? Maybe whitelisting regression.* is required, even though those are visible via dbms.procedures()?

meltzerpete commented 5 years ago

@paltusplintus @jameswweis could you give a bit more information about your setup? i.e. operating system, cpu architecture and is there any additional information in the neo4j log? Sometimes I find that I need the neo4j log open while the error occurs to see the error messages (they aren't saved in the log file, just printed to stderr on the output in the console in red).

jameswweis commented 5 years ago

@meltzerpete Definitely, thanks for the help. Kindly see below:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:              4
CPU MHz:               800.000
CPU max MHz:           2201.0000
CPU min MHz:           800.0000
BogoMIPS:              4401.47
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              14080K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

In debug.log, the only consequent line is 2019-07-08 15:26:05.040+0000 INFO [o.n.k.i.p.Procedures] Executing DeepWalk with params: {walkLength=10, windowSize=2, numberOfWalks=10, vectorSize=10, learningRate=0.01}

After which Neo4j fails with Failed to invoke procedure `embedding.dl4j.deepWalk`: Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j.

The query I am running (although it happens for all queries that I've tried, and for both deepWalk and deepgl) is effectively:

CALL embedding.dl4j.deepWalk('
MATCH (q:Node)-[:FROM]->(z:Year)
WHERE z.value <= 1900
RETURN id(q) as id
','
MATCH (q1:Node)-[:RELATED]->(q2: Node)
RETURN id(q1) AS source, id(q2) AS target
',{graph:'cypher', write:true, writeProperty:"temporary"});

I'm running Neo4j from the latest (3.5) Docker instance. The relevant section of my Docker configuration (checkpoints extensively delayed due to read/write issues on this node) is as follows:

        environment:
            - NEO4J_dbms_memory_heap_initial__size=31g
            - NEO4J_dbms_memory_heap_max__size=31g
            - NEO4J_dbms_memory_pagecache_size=600g
            - NEO4J_dbms_tx__log_rotation_retention__policy=false
            - NEO4J_dbms_tx__log_rotation_size=1M
            - NEO4J_unsupported_dbms_tx__log_fail__on__corrupted__log__files=false
            - NEO4J_dbms_checkpoint_iops_limit=-1
            - NEO4J_dbms_checkpoint_interval_time=42h
            - NEO4J_dbms_checkpoint_interval_tx=1000000000
            - NEO4J_dbms_config_strict__validation=false
            - NEO4J_dbms_security_procedures_unrestricted=apoc.*,embedding.*,algo.*
            - NEO5J_dbms_security_procedures_whitelist=apoc.*,embedding.*,algo.*
            - NEO4J_apoc_export_file_enabled=true
            - NEO4J_apoc_import_file_enabled=true
            - NEO4J_dbms_shell_enabled=true

Let me know if you need anything else.

meltzerpete commented 5 years ago

@jameswweis no problem. It definitely looks like a dependency issue. I can't reproduce this error on my machine, but here's some things you could try:

log

jameswweis commented 5 years ago

Thanks, @meltzerpete. Regarding your questions:

(1) I didn't see anything in the neo4j.log file within the Docker container. I will check again after restarting our database and update if that changes.

(2) Yes, I have APOC, graph-algorithms, and mk-models JARs:

$ ls plugins
apoc-3.5.0.3-all.jar  graph-algorithms-algo-3.5.4.0.jar  neo4j-ml-models-1.0.3.jar

Would any of these cause conflicts, do you know?

(3) Thanks for the details. If none of the above helps, I'll try rebuilding from source as you recommend.

meltzerpete commented 5 years ago

@jameswweis ah right, sorry I missed the part above where you said you were running in docker I didn't read it properly. I think there is an issue with nd4j and docker, maybe there is some solution here https://gitter.im/deeplearning4j/deeplearning4j/archives/2018/05/07, otherwise can you try running without using docker?

timholds commented 4 years ago

@paltusplintus @meltzerpete I'm having this issue as well.

I'm using ml-models-1.0.3 which I compiled myself without docker. I followed your instructions above to change the pom file so as to remove exclusions and classifier tags for the nd4j-native-platform. However, when I run CALL dbms.procedures() none of the embedding methods show up - only the regression methods are present.

The plugins I have installed are apoc-3.3.0.1.jar, graphQL-3.3.0.0.jar, graphAlgorithms-3.3.0.0.jar, neo4j-ml-models-1.0.2.jar

I have also set: dbms.security.procedures.unrestricted=algo.*,apoc.*,regression.*, embedding.* dbms.security.procedures.whitelist=algo.*,apoc.*,regression.*, embedding.*

Any other idea on how we might be able to fix this?

meltzerpete commented 4 years ago

@timholds I'm not sure the problem, but if the procs are not listed then it is likely a problem that occurs during database startup when it scans for them. I'm a little unclear on the problem you are facing.. Could you confirm that you also get the error Could not initialize class org.nd4j.linalg.factory.Nd4j? Is it that you got this error, then made the suggested change and now the procedures are not listed? Or am I misunderstanding?

Also, if you could post a copy of the logs/debug.log entries during database startup that might have some information about why the procedures aren't being registered.