touretzkyds / oldWordEmbeddingDemo

Word embeddings online demo, rewrite
https://jxu.github.io/WordEmbeddingDemo/
1 stars 0 forks source link

Residual dimension #17

Closed touretzkyds closed 3 years ago

touretzkyds commented 3 years ago

In the original demo the residual dimension was larger in the display (it spanned values 0 to 0.9) and "refrigerator" was well separated from all the human concepts; see image below.

In the new demo the residual dimension has a smaller range of values (0.1 to 0.5) and "refrigerator" is not that well separated from the human concepts. What is it about the 300 dimensional vectors that leads to this difference in the residual? Is it possible we're computing the residual incorrectly? Is there a better way to do it?

One possible hack is to multiply the residual by 2 in order to scale the display the way we want.

Old demo: oldword2vec

New demo: newword2vec

touretzkyds commented 3 years ago

I tried changing the gender dimension to just grandfather-grandmother and the age dimension to just grandfather-grandson and the king-man+woman=queen equation put the result much closer to woman. So I think I was right that the age and gender basis vectors are being polluted by the multiple usages of words like "man" and "queen".

The bad news is that royal words like king/queen/prince/princess now have high residuals so they end up close to chair and computer. So I'd like to find a way to fix that without polluting the basis vectors too much.

jxu commented 3 years ago

We expect royal words to have high residuals right? Because "king" carries a lot of word connotation (in terms of surrounding words) besides gender.

jxu commented 3 years ago

The residual calculations follow the formula described in #3 and the original demo: starting with the word, and subtracting out the projections onto the age and gender features. The original demo only used words used to define the age and gender features for the calculation of residual features. It is possible though slower to use the entire vocab to define the residual feature.

touretzkyds commented 3 years ago

It is possible though slower to use the entire vocab to define the residual feature.

I don't know what that means. How would you use the entire vocabulary?

The idea of "residual" is it's the components of the vector that do not lie in the directions of the "age" or "gender" basis vectors. So I believe we're computing the residual correctly. Question: what is the angle in degrees between the age and gender basis vectors? Ideally they would be orthogonal but I doubt this is the case.

And yes, royal words should have some residual value that sets them apart from non-royal words used as the basis vectors for age and gender. But I was thinking that this residual would be way lower than the residual for words that don't even refer to humans (like "refrigerator"), or aren't even concrete objects (e.g., "happiness"), or aren't even nouns (e.g., "argue"). This assumption could be incorrect, though. Since the embedding vectors encode usage statistics, not semantic features, and since words like "king" and "queen" are used in multiple ways and have idiomatic uses, things are messier than one would hope.

jxu commented 3 years ago

The idea for CBoW is that hopefully semantic relations are also captured in the model, which as you note really only captures word contexts. I think it makes sense that royal words appear between familial words and inanimate object words, since I would guess words like "uncle" carry most of their meaning in age, gender, family, and personhood, while "king" keeps some of the family relationship aspect (father of "prince") and introduces a royal aspect which adds into the residual. It's difficult to read too much into these calculations though because ultimately fasttext (and the original word2vec which was based on neural network output, so technically this demo is misnamed) and other word models are trained on immense amounts of data and aren't really designed to be interpretable, especially since the vectors were originally 300 (or 100) dimensional. I don't know if it's even possible to make the models more interpretable beyond basics.

touretzkyds commented 3 years ago

I think what we have now is pretty good. But can we try scaling the residual value by a factor of 2, so that the residual axis is wider than the gender axis? Right now the gender axis is wider, i.e., the floor of the 3D plot is rectangular, not square, but the long axis is the wrong one.

jxu commented 3 years ago

Here's the residual dimension scaled by two. The residual dimension is now longer than the other axes.

Screenshot from 2021-08-12 20-47-33

jxu commented 3 years ago

The dot product of age and gender (unit) features is about 0.06, meaning the features are nearly orthogonal at 86.5 deg. Maybe it's worth noting I believe in high dimensions random vectors (which these aren't) will be very nearly perpendicular just due to the large number of dimensions.

touretzkyds commented 3 years ago

This is great. The plot looks good. And it's reassuring to hear that age and gender are amost perfectly orthogonal even though they are both derived from human kinship terms.

I was also pleasantly surprised to discover that grape-wine+pickle = cucumber. Although all these terms are pretty neutral in the gender dimension, they are mildly separated in age and nicely separated along the residual dimension.