graphaware / neo4j-nlp

NLP Capabilities in Neo4j
https://hume.graphaware.com/
334 stars 82 forks source link

ga.nlp.ml.word2vec.attach gives java.lang.RuntimeException: Error #52

Closed AndreasHenningsson closed 6 years ago

AndreasHenningsson commented 6 years ago

When using the following commando match (n:Tag) call ga.nlp.ml.word2vec.attach(n) YIELD result return result

It gives Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure ga.nlp.ml.word2vec.attach: Caused by: java.lang.RuntimeException: Error

Installed base

Neo 3.3.1 graphaware-nlp-3.3.1.51.2 nlp-opennlp-3.3.1.51.2 nlp-stanfordnlp-3.3.1.51.2 graphaware-server-community-all-3.3.1.51

ikwattro commented 6 years ago

Hi @AndreasHenningsson

I think this process should be better documented. The attach method is meant to add the vector of a word to the corresponding node. For that, an existing vectors model should exist.

We recommend ( and tested ) ConceptNet Numberbatch (Word2Vec) : https://github.com/commonsense/conceptnet-numberbatch

I extract the Swedish vectors, you can download it here :

https://www.dropbox.com/s/co43akx8loidrww/swedishSource.zip?dl=0

How does it work ?

Extract the zip in your import directory of neo4j. This will serve as the source, next the vectors should be added in a lucene index outside of neo4j.

The following procedure will do exactly that where you provide the source and destination location :

CALL ga.nlp.ml.word2vec.addModel(
   "/Users/ikwattro/dev/_graphs/nlp/import/swedishSource",
   "/Users/ikwattro/dev/_graphs/nlp/import/swedishIndex",
   "swedish-numberbatch"
)

( note that you need to provide the full path )

This should run relatively fast.

You can then return the vectors or for example compute the similarity between two words based on this model :

WITH 
ga.nlp.ml.word2vec.wordVector('äpple', 'swedish-numberbatch') AS appleVector,
ga.nlp.ml.word2vec.wordVector('frukt', 'swedish-numberbatch') AS fruitVector
RETURN ga.nlp.ml.similarity.cosine(appleVector, fruitVector) AS simil

╒══════════════════╕
│"simil"           │
╞══════════════════╡
│0.5729867131088588│
└──────────────────┘

Then for the signature of the attach procedure :

CALL ga.nlp.ml.word2vec.attach({query:'MATCH (t:Tag:SwedishTag) RETURN t', modelName:'swedish-numberbatch'})

The vector is now attached to the node :

╒══════════════════════════════════════════════════════════════════════╕
│"n"                                                                   │
╞══════════════════════════════════════════════════════════════════════╡
│{"pos":[],"lastTxId":"1517168041047","ne":[],"language":"sv","id":"äpp│
│le_sv","value":"äpple","word2vec":[-0.0084,-0.0327,0.116,0.1212,-0.000│
│4,0.011,0.1264,-0.0874,0.0984,-0.1155,0.0916,0.0191,0.1011,-0.0908,-0.│
│1154,-0.0146,0.0911,-0.0586,-0.0481,-0.0614,0.0055,-0.0192,0.1477,-0.0│
│856,0.1098,-0.1223,-0.0227,-0.0317,0.0499,-0.0607,0.014,0.013,0.0059,0│
│.0191,-0.0335,0.052,0.0074,0.0886,0.0005,0.0686,0.0192,-0.0488,0.064,-│
│0.0907,-0.0506,0.1281,0.0834,0.0485,-0.007,0.0635,-0.0095,-0.0155,-0.0│
│416,0.0163,0.0247,0.0839,0.0404,-0.0053,0.039,-0.1269,-0.0831,-0.0714,│
│-0.0331,0.0031,-0.0251,0.0106,-0.0609,-0.0109,0.1094,-0.0479,0.0121,-0│
│.0347,-0.0118,-0.0506,0.0829,-0.0366,-0.0386,0.0022,-0.0391,0.0375,0.0│
│803,0.0187,-0.0213,0.0465,0.0559,0.0723,0.1033,-0.0281,-0.0072,-0.0595│
│,0.0381,0.017,-0.0316,0.0349,-0.0778,-0.0204,-0.0609,-0.018,-0.0325,-0│
│.0023,-0.0093,0.0079,-0.0162,0.0903,0.029,-0.1059,-0.0235,-0.0035,0.11│
│68,0.0223,-0.0463,0.023,-0.0767,0.0584,0.0094,0.0253,0.0593,0.0354,0.0│
│208,0.0037,0.0455,0.0383,0.0224,0.0353,0.0697,-0.0294,-0.0744,0.1362,-│
│0.0313,-0.0589,-0.067,0.0278,0.0175,-0.0007,-0.0195,0.0083,0.0137,0.00│
│17,0.0372,-0.0856,0.0037,-0.1365,-0.0781,-0.027,-0.0119,-0.0325,0.0959│
│,0.0158,-0.0144,0.1404,-0.0115,0.0113,-0.0495,-0.0376,-0.0271,0.0552,0│
│.0267,0.1205,-0.0014,-0.0524,-0.0917,-0.0102,0.0561,0.0116,-0.1097,-0.│
│0378,0.0279,-0.0565,-0.0632,-0.0507,0.1278,0.0771,0,0.0526,0.0125,-0.1│
│067,0.0012,-0.1449,0.0437,0.0508,-0.102,-0.0542,-0.0343,0.0731,0.0526,│
│-0.0543,0.0854,-0.0484,-0.0676,-0.0526,-0.0393,0.0274,-0.0478,0.0397,-│
│0.0154,0.0165,-0.1145,-0.0477,0.1238,-0.0214,0.0363,-0.0563,0.0003,0.0│
│511,-0.0379,-0.0127,0.0624,0.0386,0.0183,0.0887,-0.1172,-0.0501,0.0448│
│,-0.0416,-0.0703,0.0157,-0.0163,-0.0741,0.0296,0.0168,0.074,-0.0537,-0│
│.001,-0.0631,-0.0446,-0.0372,0.048,-0.0385,0.0432,-0.0765,-0.1226,0.03│
│39,0.0263,-0.0372,-0.0206,0.0849,-0.0093,0.0671,-0.0597,0.0409,0.0649,│
│0.0371,0.0323,-0.0201,0.0074,-0.0003,-0.0153,-0.0329,0.0367,0.0357,-0.│
│0242,0.0312,-0.0117,0.0963,-0.0233,-0.0174,-0.0091,0.0261,0.018,-0.079│
│7,0.0176,0.0148,0.0178,0.0199,0.0247,0.0431,-0.076,0.0118,-0.0357,-0.0│
│034,0.0174,-0.0241,0.0534,0.0584,-0.0002,-0.0499,-0.0122,-0.0363,0.032│
│6,-0.062,0.0269,-0.0731,0.0217,0.0163,-0.0107,-0.0508,0.1129,0.0584,0.│
│0671,0.0794,-0.0513,0.0282,0.0276,0.0044,0.105,0.0893,0.0356,-0.0099,0│
│.0035,-0.0829]}                                                       │
└──────────────────────────────────────────────────────────────────────┘
AndreasHenningsson commented 6 years ago

Many many thanks! Works like a sharm 😊

Skickades från E-post för Windows 10

Från: Christophe Willemsen Skickat: den 28 januari 2018 20:27 Till: graphaware/neo4j-nlp Kopia: AndreasHenningsson; Mention Ämne: Re: [graphaware/neo4j-nlp] ga.nlp.ml.word2vec.attach givesjava.lang.RuntimeException: Error (#52)

Hi @AndreasHenningsson I think this process should be better documented. The attach method is meant to add the vector of a word to the corresponding node. For that, an existing vectors model should exist. We recommend ( and tested ) ConceptNet Numberbatch (Word2Vec) : https://github.com/commonsense/conceptnet-numberbatch I extract the Swedish vectors, you can download it here : https://www.dropbox.com/s/co43akx8loidrww/swedishSource.zip?dl=0 How does it work ? Extract the zip in your import directory of neo4j. This will serve as the source, next the vectors should be added in a lucene index outside of neo4j. The following procedure will do exactly that where you provide the source and destination location : CALL ga.nlp.ml.word2vec.addModel( "/Users/ikwattro/dev/_graphs/nlp/import/swedishSource", "/Users/ikwattro/dev/_graphs/nlp/import/swedishIndex", "swedish-numberbatch" ) ( note that you need to provide the full path ) This should run relatively fast. You can then return the vectors or for example compute the similarity between two words based on this model : WITH ga.nlp.ml.word2vec.wordVector('äpple', 'swedish-numberbatch') AS appleVector, ga.nlp.ml.word2vec.wordVector('frukt', 'swedish-numberbatch') AS fruitVector RETURN ga.nlp.ml.similarity.cosine(appleVector, fruitVector) AS simil

╒══════════════════╕ │"simil" │ ╞══════════════════╡ │0.5729867131088588│ └──────────────────┘ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ikwattro commented 6 years ago

Closing this issue as the documentation has been updated.