materialsintelligence / mat2vec

Supplementary Materials for Tshitoyan et al. "Unsupervised word embeddings capture latent knowledge from materials science literature", Nature (2019).
MIT License
616 stars 180 forks source link

Question about target and context words #13

Closed dkajtoch closed 4 years ago

dkajtoch commented 4 years ago

I have a question about your research approach communicated in Nature. You use there phrases "target word" and "context word". Normally, in the skip-gram model embedding for the "target word" (input layer) is different that the embedding for the "context word" (output layer). In gensim if you use model.wv.most_similar you are effectively searching for similar words using embeddings from the input layer. You can also access "context word" embeddings via model.syn1neg. Where you using both embeddings for analyzing e.g. relation between chemical compound and "thermoelectric"?

vtshitoyan commented 4 years ago

Hi @dkajtoch thanks for the great question. The information is available in the caption of Figure 2b. We use "input embeddings" between the application word and context words, and a combination of input and output embeddings for the context words and materials. This pretty much translates to "which words that are similar to the application word is this material likely to be mentioned with". I hope this helps.

jdagdelen commented 4 years ago

Closing this issue since the discussion seems to have been resolved, but please feel free to reopen if you want to continue.