[Question]: NER with the Transformer

mabergerx commented 5 years ago

Description

After successfully using the Transformer for my own translation task I was wondering if this powerful model would also perform for a NER (Named Entity Recognition) task? I was thinking as modelling this task as a seq2seq problem with sequences like:

input ---------------> output This car is a Volvo --> O O O O ORGANISATION

ideally, at the same time I want to be able to recognise entities regardless of case. So, two questions: 1) Would this kind of approach intuitively work with the Transformer architecture? is my thinking correct here? 2) Would it make sense to feed a copy of my dataset with everything lowercased into the model to account for lowercase or would this data duplication be harmful/useless?

Just wanting to hear some opinions!

lkluo commented 5 years ago

Transformer shall be used for most of seq2seq problem including named entity recognition. I am thinking it would be better not to lowercase the sentence as for most of capitalised words their chance of being named entity is high.

martinpopel commented 5 years ago

In NER, you need one output tag (e.g. in BIO encoding) for each input word. This usually means that your input is already tokenized. If this is the case, you have three options:

Use seq2seq as suggested by @lkluo. In this case, it is not granted that the number of output tags will be equal to the number of input tokens, but a well-trained model should learn this quite soon.
Use just a Transformer encoder with a softmax layer on top (and no decoder). You can disable T2T subword tokenization and provide your own vocabulary based on words with the replace_oov option.
As above, but use subwords and prevent thus the need of OOVs. In this case, you need to edit the output sequence in the training data to match the length of the source sequence, e.g. if Volvo is split into subwords Vol and vo, it will be: This/O car/O is/O Vol_/ORG vo/CONTINUATION.

You can also try character-based models (it makes sense for NER).

Capitalization is one of the most important features for NER, so lowercasing everything is definitely a bad idea. Character-based models will surely learn the difference between lowercase and uppercase automatically (and even subword-based models, I think).

yumath commented 5 years ago

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

wenhrui commented 5 years ago

However, when I only use Transformer Encoder with a softmax layer on top, the model tends to set all labels to O.

Have you solved your problem？I got the same problem.

yumath commented 5 years ago

@wenhrui no, I haven't.

Saichethan commented 5 years ago

How and where to use the Transformer for NER task (I have implemented using CNN+Bi-LSTM+CRF)

niranjan8129 commented 5 years ago

I am having same issue .. @Saichethan can you share the github if you solved

qq547276542 commented 4 years ago

TENER: Adapting Transformer Encoder for Name Entity Recognition https://arxiv.org/pdf/1911.04474.pdf

tensorflow / tensor2tensor

[Question]: NER with the Transformer #1103

Description