neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

One hot encoding for comma separated values #219

Closed meg261995 closed 1 year ago

meg261995 commented 1 year ago

Suppose I have values something like this:

1) eu 2) en 3) en,eu 4) bg 5) ca 6) bg,eu

I want the one hot encoding to look like this:

1) 100000 2) 010000 3) 110000 4) 000100 5) 000010 6) 100100

How do I achieve this in cypher? I do not want 3 and 6 to be considered as separate classes, its possible in python, but I'm a beginner in cypher

Mats-SX commented 1 year ago

Hello @meg261995 and thanks for reaching out to us!

Are you attempting to use the gds.alpha.ml.oneHotEncoding() function? If yes, then you must provide two inputs: the dictionary of tokens/words, and then ask for the encoding of a subsequence of items from the dictionary.

For example:

WITH ['eu', 'en', 'bg', 'ca'] AS dictionary
WITH 
  gds.alpha.ml.oneHotEncoding(dictionary, ['en', 'eu']) AS eneu,
  gds.alpha.ml.oneHotEncoding(dictionary, ['bg', 'eu']) AS bgeu
RETURN eneu, bgeu

which returns

╒═════════╤═════════╕
│"eneu"   │"bgeu"   │
╞═════════╪═════════╡
│[1,1,0,0]│[1,0,1,0]│
└─────────┴─────────┘

See also https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/one-hot-encoding/

All the best Mats

Mats-SX commented 1 year ago

If you are in need of more general Cypher help, I recommend making use of

FlorentinD commented 1 year ago

@meg261995 Could Mats response solve your issue?

I am closing the issue due to inactivity. Feel free to reopen it :)