word embedding association test

See Caliskan, Bryson, and Narayanan (2017) Semantics derived automatically from language corpora contain human-like biases., Science, 356(6334):183-186.

Given two collections of words, such as musical instruments and weapons, and an axis such as pleasant-unpleasant, they measured how words in each category distribute along the axis.

We could implement this as part of the "word play" panel. Users would input two lists of words, and select a semantic dimension (defined in our Dimensions panel), and we could (1) calculate the d statistic following the formula in the paper, and also (2) plot each word on a line based on its projection along the semantic dimension.

touretzkyds / oldWordEmbeddingDemo

word embedding association test #44