sjebbara / clwe-ote

Improving Opinion-Target Extraction with Character-Level Word Embeddings
11 stars 2 forks source link

word2vec logic & lexicon files methods #1

Open LakshmiPrasath opened 6 years ago

LakshmiPrasath commented 6 years ago

Hi,

Really the works are great!!

I have some questions.

  1. How the file word_embeds_restaurants_ote.txt is created? which algorithm word2vec is used? can you please share?
  2. How the Prefix files and suffix files are created? can you please share the algorithm for this too??

Please give me your replies, this will help really my NLP project and cite your workings.

Thanks.

sjebbara commented 6 years ago

Hi, thank you for your interest in my work! I appreciate it.

  1. I can't remember or find any reference to "word_embeds_restaurants_ote.txt". Where did you see this? The word embeddings are trained using the Skip-Gram implementation from the Gensim library with negative sampling. I used the same embeddings as in this earlier work of mine.
  2. I extracted the vector representations for the word-level embeddings and character-level embeddings in "analyze_trained_model.py". After that I applied T-SNE from scikit-learn to the vectors to obtain the two dimensional vectors. The results are exported to a bunch of files. The actual visualization happens in "plot_suffixes.py" which I now included in the repository.

I hope this helps a bit. Tell me if there is anything else you need to know or that is unclear.

Edit: You may want to check out the newest version