materialsintelligence / mat2vec

Supplementary Materials for Tshitoyan et al. "Unsupervised word embeddings capture latent knowledge from materials science literature", Nature (2019).
MIT License
616 stars 180 forks source link

Training my own word embeddings #18

Closed TasnimGh closed 3 years ago

TasnimGh commented 4 years ago

Hi, I am trying to train my own word embeddings on my own corpus using your model. Could you please help me step by step how can I do that? I run the model according to your instructions in the readme file and it works.

vtshitoyan commented 4 years ago

Hi, does the Training section of the README file answer your question? You have to change the path in the --corpus flag to your own pre-processed corpus, and that should be enough. You can also run python phrase2vec.py --help for more info.

TasnimGh commented 4 years ago

Thank you, I did it and it helped. For the pre-processed corpus. Can I uses the process.py? How?

vtshitoyan commented 4 years ago

https://github.com/materialsintelligence/mat2vec#processing