touretzkyds / oldWordEmbeddingDemo

Word embeddings online demo, rewrite
https://jxu.github.io/WordEmbeddingDemo/
1 stars 0 forks source link

Vector dislays when doing vector arithmetic #14

Closed touretzkyds closed 3 years ago

touretzkyds commented 3 years ago

When we've computed a result in vector arithmetic mode, we should commandeer the Vector Display and automatically populate its slots with the following six vectors:

  1. king
  2. man
  3. king - man
  4. woman
  5. king - man + woman
  6. queen (assuming this is the closest matching word to the result)

The user should still be free to replace any of these slots by clicking on a word in the 3D plot and then clicking on the slot.

jxu commented 3 years ago

Do we want to add these results as "pseudo-words" to our vocab list or just calculate them within the magnified display? Adding words to the vocab list will probably mess up future calculations

touretzkyds commented 3 years ago

I don't think we want to make these into permanent vocabulary additions. Just show them in the the vector display as described above, and show the result vector (king-man+woman) as a point in the 3D display.

jxu commented 3 years ago

Since my vector plotting implementation is based off of words as keys into the word vector map, the cleanest solution I could come up with is adding the new "arithmetic" words and vectors into the word vector map and separately maintaining a list of actual words, so searching for nearest words in vector arithmetic does not show up arithmetic words. The alternative approach is to separately track words and vectors to be plotted instead of just using words and have the vectors changed by the vector arithmetic function, because there is no arithmetic word in the word map to lookup.

touretzkyds commented 3 years ago

You could easily flag an arithmetic word by adding a special prefix character like "$", so you could tell right away that "$king-man" was an arithmetic word. Then all you have to do is ignore any "$"-prefixed words when computing nearest neighbors, and delete the "$" when displaying the word in the vector display. This would save you the trouble of maintaining and accessing two word lists.

jxu commented 3 years ago

That is true. Actually since all my words only use alphabetical characters, any word with - or + in it is not a real word.

jxu commented 3 years ago

I tried putting the arithmetic words at a slant because they don't fit when put horizontally. I don't think it looks good but otherwise the horizontal words take up too much space.

Screenshot from 2021-07-26 16-36-51