How does magnitude generate ELMO vectors for single words?

I'm using this model:

http://magnitude.plasticity.ai/elmo/medium/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.magnitude

I've extracted ELMO embeddings for personality traits, computed pairwise cosine similarity, performed multidimensional scaling, and then visualized the result:

As you can see, the results don't make much sense. For example, with other embeddings (e.g., word2vec, paragram-sl999), you'll at least get positive traits on one side and negative traits on the other. I don't see much rhyme or reason in the above plot.

I get better results if I get vectors for the above traits by putting each of them in a 'sentence' with the word 'trait'. And I also get decent results if I use Allen NLP's elmo implementation even when not contextualizing the trait words.

I've also tried regressing human judgments about masculinity and femininity directly on the embeddings, and I get pretty much random predictions, whereas using other vectors (again, word2vec, paragram) or the getting ELMO vectors contextualized by the word 'trait' predicts the human judgments pretty well.

plasticityai / magnitude

How does magnitude generate ELMO vectors for single words? #46