Closed AndreasHenningsson closed 6 years ago
Hi @AndreasHenningsson
I think this process should be better documented. The attach
method is meant to add the vector of a word to the corresponding node. For that, an existing vectors model should exist.
We recommend ( and tested ) ConceptNet Numberbatch (Word2Vec) : https://github.com/commonsense/conceptnet-numberbatch
I extract the Swedish vectors, you can download it here :
https://www.dropbox.com/s/co43akx8loidrww/swedishSource.zip?dl=0
How does it work ?
Extract the zip in your import directory of neo4j. This will serve as the source, next the vectors should be added in a lucene index outside of neo4j.
The following procedure will do exactly that where you provide the source and destination location :
CALL ga.nlp.ml.word2vec.addModel(
"/Users/ikwattro/dev/_graphs/nlp/import/swedishSource",
"/Users/ikwattro/dev/_graphs/nlp/import/swedishIndex",
"swedish-numberbatch"
)
( note that you need to provide the full path )
This should run relatively fast.
You can then return the vectors or for example compute the similarity between two words based on this model :
WITH
ga.nlp.ml.word2vec.wordVector('äpple', 'swedish-numberbatch') AS appleVector,
ga.nlp.ml.word2vec.wordVector('frukt', 'swedish-numberbatch') AS fruitVector
RETURN ga.nlp.ml.similarity.cosine(appleVector, fruitVector) AS simil
╒══════════════════╕
│"simil" │
╞══════════════════╡
│0.5729867131088588│
└──────────────────┘
Then for the signature of the attach
procedure :
CALL ga.nlp.ml.word2vec.attach({query:'MATCH (t:Tag:SwedishTag) RETURN t', modelName:'swedish-numberbatch'})
The vector is now attached to the node :
╒══════════════════════════════════════════════════════════════════════╕
│"n" │
╞══════════════════════════════════════════════════════════════════════╡
│{"pos":[],"lastTxId":"1517168041047","ne":[],"language":"sv","id":"äpp│
│le_sv","value":"äpple","word2vec":[-0.0084,-0.0327,0.116,0.1212,-0.000│
│4,0.011,0.1264,-0.0874,0.0984,-0.1155,0.0916,0.0191,0.1011,-0.0908,-0.│
│1154,-0.0146,0.0911,-0.0586,-0.0481,-0.0614,0.0055,-0.0192,0.1477,-0.0│
│856,0.1098,-0.1223,-0.0227,-0.0317,0.0499,-0.0607,0.014,0.013,0.0059,0│
│.0191,-0.0335,0.052,0.0074,0.0886,0.0005,0.0686,0.0192,-0.0488,0.064,-│
│0.0907,-0.0506,0.1281,0.0834,0.0485,-0.007,0.0635,-0.0095,-0.0155,-0.0│
│416,0.0163,0.0247,0.0839,0.0404,-0.0053,0.039,-0.1269,-0.0831,-0.0714,│
│-0.0331,0.0031,-0.0251,0.0106,-0.0609,-0.0109,0.1094,-0.0479,0.0121,-0│
│.0347,-0.0118,-0.0506,0.0829,-0.0366,-0.0386,0.0022,-0.0391,0.0375,0.0│
│803,0.0187,-0.0213,0.0465,0.0559,0.0723,0.1033,-0.0281,-0.0072,-0.0595│
│,0.0381,0.017,-0.0316,0.0349,-0.0778,-0.0204,-0.0609,-0.018,-0.0325,-0│
│.0023,-0.0093,0.0079,-0.0162,0.0903,0.029,-0.1059,-0.0235,-0.0035,0.11│
│68,0.0223,-0.0463,0.023,-0.0767,0.0584,0.0094,0.0253,0.0593,0.0354,0.0│
│208,0.0037,0.0455,0.0383,0.0224,0.0353,0.0697,-0.0294,-0.0744,0.1362,-│
│0.0313,-0.0589,-0.067,0.0278,0.0175,-0.0007,-0.0195,0.0083,0.0137,0.00│
│17,0.0372,-0.0856,0.0037,-0.1365,-0.0781,-0.027,-0.0119,-0.0325,0.0959│
│,0.0158,-0.0144,0.1404,-0.0115,0.0113,-0.0495,-0.0376,-0.0271,0.0552,0│
│.0267,0.1205,-0.0014,-0.0524,-0.0917,-0.0102,0.0561,0.0116,-0.1097,-0.│
│0378,0.0279,-0.0565,-0.0632,-0.0507,0.1278,0.0771,0,0.0526,0.0125,-0.1│
│067,0.0012,-0.1449,0.0437,0.0508,-0.102,-0.0542,-0.0343,0.0731,0.0526,│
│-0.0543,0.0854,-0.0484,-0.0676,-0.0526,-0.0393,0.0274,-0.0478,0.0397,-│
│0.0154,0.0165,-0.1145,-0.0477,0.1238,-0.0214,0.0363,-0.0563,0.0003,0.0│
│511,-0.0379,-0.0127,0.0624,0.0386,0.0183,0.0887,-0.1172,-0.0501,0.0448│
│,-0.0416,-0.0703,0.0157,-0.0163,-0.0741,0.0296,0.0168,0.074,-0.0537,-0│
│.001,-0.0631,-0.0446,-0.0372,0.048,-0.0385,0.0432,-0.0765,-0.1226,0.03│
│39,0.0263,-0.0372,-0.0206,0.0849,-0.0093,0.0671,-0.0597,0.0409,0.0649,│
│0.0371,0.0323,-0.0201,0.0074,-0.0003,-0.0153,-0.0329,0.0367,0.0357,-0.│
│0242,0.0312,-0.0117,0.0963,-0.0233,-0.0174,-0.0091,0.0261,0.018,-0.079│
│7,0.0176,0.0148,0.0178,0.0199,0.0247,0.0431,-0.076,0.0118,-0.0357,-0.0│
│034,0.0174,-0.0241,0.0534,0.0584,-0.0002,-0.0499,-0.0122,-0.0363,0.032│
│6,-0.062,0.0269,-0.0731,0.0217,0.0163,-0.0107,-0.0508,0.1129,0.0584,0.│
│0671,0.0794,-0.0513,0.0282,0.0276,0.0044,0.105,0.0893,0.0356,-0.0099,0│
│.0035,-0.0829]} │
└──────────────────────────────────────────────────────────────────────┘
Many many thanks! Works like a sharm 😊
Skickades från E-post för Windows 10
Från: Christophe Willemsen Skickat: den 28 januari 2018 20:27 Till: graphaware/neo4j-nlp Kopia: AndreasHenningsson; Mention Ämne: Re: [graphaware/neo4j-nlp] ga.nlp.ml.word2vec.attach givesjava.lang.RuntimeException: Error (#52)
Hi @AndreasHenningsson I think this process should be better documented. The attach method is meant to add the vector of a word to the corresponding node. For that, an existing vectors model should exist. We recommend ( and tested ) ConceptNet Numberbatch (Word2Vec) : https://github.com/commonsense/conceptnet-numberbatch I extract the Swedish vectors, you can download it here : https://www.dropbox.com/s/co43akx8loidrww/swedishSource.zip?dl=0 How does it work ? Extract the zip in your import directory of neo4j. This will serve as the source, next the vectors should be added in a lucene index outside of neo4j. The following procedure will do exactly that where you provide the source and destination location : CALL ga.nlp.ml.word2vec.addModel( "/Users/ikwattro/dev/_graphs/nlp/import/swedishSource", "/Users/ikwattro/dev/_graphs/nlp/import/swedishIndex", "swedish-numberbatch" ) ( note that you need to provide the full path ) This should run relatively fast. You can then return the vectors or for example compute the similarity between two words based on this model : WITH ga.nlp.ml.word2vec.wordVector('äpple', 'swedish-numberbatch') AS appleVector, ga.nlp.ml.word2vec.wordVector('frukt', 'swedish-numberbatch') AS fruitVector RETURN ga.nlp.ml.similarity.cosine(appleVector, fruitVector) AS simil
╒══════════════════╕ │"simil" │ ╞══════════════════╡ │0.5729867131088588│ └──────────────────┘ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Closing this issue as the documentation has been updated.
When using the following commando match (n:Tag) call ga.nlp.ml.word2vec.attach(n) YIELD result return result
It gives Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure
ga.nlp.ml.word2vec.attach
: Caused by: java.lang.RuntimeException: ErrorInstalled base
Neo 3.3.1 graphaware-nlp-3.3.1.51.2 nlp-opennlp-3.3.1.51.2 nlp-stanfordnlp-3.3.1.51.2 graphaware-server-community-all-3.3.1.51