guillaumegenthial / tf_ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Apache License 2.0
923 stars 275 forks source link

Conceptual issue in character embeddings #40

Open mraduldubey opened 5 years ago

mraduldubey commented 5 years ago

I have this conceptual doubt in the part where we are obtaining word level representations from characters using the final output of BiLSTM network. We are initializing the character embeddings using xavier_initialization which just ensures that the cells do not saturate. So, how do these random embeddings capture any meaningful information? And how is this network trained or is it unsupervised?

guillaumegenthial commented 5 years ago

Hi @mraduldubey , You are right, the character embeddings are indeed initialized randomly. However, at training time, the loss is backpropagated all the way and the character embeddings are thus updated (thus using supervised learning).

mraduldubey commented 5 years ago

Thanks @guillaumegenthial for the reply. This way the ground truth will be a vector representing the whole word. So, what is the ground truth here?

guillaumegenthial commented 5 years ago

You train the network to predict the tags. Turns out some parameters of the network correspond to character embeddings, so these are trained to help the network predict the tags. So the ground truth is the tag, and the learned embeddings help predict this tag.

mraduldubey commented 5 years ago

So, you mean that the word representation n/w, the contextual word representation n/w and the decoder, though mentioned separately in the blog, are trained simultaneously in conjunction with the ground truth being the tags and the backpropagation happens from the final layer back to the word representation n/w.