guillaumegenthial / sequence_tagging

Named Entity Recognition (LSTM + CRF) - Tensorflow
https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html
Apache License 2.0
1.94k stars 703 forks source link

question about two words phrase #65

Closed mrgonext closed 5 years ago

mrgonext commented 6 years ago

Thank you for great sharing code and tutorial. I've a question related to trainning format about two words phrase. e.g New York, I think we can create trainning file in many ways way 1: New B-LOC York I-LOC way 2 New_York LOC way 3 New York LOC

The question is: which way is best for trainning? If the best is way 2 or way 3 then what should we create words vector in glove? do we need create word vector like New_York?

Thank you for your help!

guillaumegenthial commented 5 years ago

You're speaking about Tokenization! way 1 is the recommended way, because your model should learn that an entity can span multiple tokens.