rstudio-conf-2020 / dl-keras-tf

rstudio::conf(2020) deep learning workshop
Creative Commons Attribution Share Alike 4.0 International
158 stars 82 forks source link

What is the interpretation of similar words based on embeddings? #11

Open rohit-das opened 4 years ago

rohit-das commented 4 years ago

For direct word embedding the output made sense

# natural language modeling embeddings
get_similar_words("horrible", word_embeddings)
# horrible  terrible     awful       bad    acting 
# 1.0000000 0.9248301 0.8892507 0.8432761 0.8015473 

But how do we understand the relationship between words generated by word embeddings learned from classification

similar_classification_words("horrible", embedding_wts)
# horrible     keith    brooks     blond      york  sporting 
# 1.0000000 0.7858497 0.7819669 0.7724826 0.7616312 0.7583101 

Is there a way to put this in better context?

OmaymaS commented 4 years ago

When you get embeddings learned from classification, the results will depend on the underlying data, labels, embedding size, how good the model is. Maybe those words appear together in your dataset. Also you could experiment with the embedding layer size (think about it as number of features representing each word) and retrain the model.