Open queirozfcom opened 7 years ago
Yes, that implementation description is correct.
The "why" is detailed in the blog post.
The blog post very briefly goes into why you're averaging the word vectors to get character vectors. Are you aware of any rigorous comparison between these average derived char vectors vs true learned char vectors?
My intuition is that these char vectors will be poor approximations as they're not distinguishing between neighboring characters and ones that appear far from the char since they're all treated equally.
I also have the same question, @queirozfcom @brandonrobertz did you also try this method and get some good result?
@fermat97 It's my opinion, now, after trying this method and also just building my own character vectors that this method is a very poor approximation. You're throwing away a lot of distance-related character information (which is important for character embeddings). It's quite easy to train character embeddings on even giant datasets so I suggest just doing that.
@brandonrobertz thanks a lot, I have also tried and the result was poor.
Hi. I was looking at
create_embeddings.py
to see how you derived char embeddings directly from word embeddings.It looks like you equate a char embedding with the average of all word vectors that contain that char, counting each char multiple times if it occurs more than once in a word. Is that correct?
Did you decide to do this because you got good results or was there some other reason for this?
Thanks!
FA