bert embeddings for NER training

stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

https://stanfordnlp.github.io/stanza/

Other

7.27k stars 891 forks source link

bert embeddings for NER training #307

Closed aishwarya-agrawal closed 4 years ago

aishwarya-agrawal commented 4 years ago

Is it possible to use bert pretrained embeddings to train stanza NER.

yuhui-zh15 commented 4 years ago

What do you mean by BERT pre-trained embeddings? If you mean that you would like to extract only embeddings from the BERT, it is possible to dump the embedding to our word vector format and train the model. If you mean that you would like to use the whole BERT architectures, sorry to say we don't support this.

Besides, we have carefully considered the model architectures (contextualized word embedding featured by character-level RNN), which leads to the SOTA performance (http://nlpprogress.com/english/named_entity_recognition.html, our model is equivalent to a smaller version of flair). I'm afraid that introducing BERT will not improve the performance but will run much slower.

aishwarya-agrawal commented 4 years ago

Okay, Thanks for answering. This clears my issue. What we need is full LM and not just the embeddings. I have one more question in mind, I was trying to train NER model on custom dataset with fasttext embeddings. It seems to be training very slow like 14k steps in 12hrs with GPU. Does stanza support distributed training?

yuhui-zh15 commented 4 years ago

For most NER tasks, given their data size, the training should be finished within 1-2 days (depending on your GPU), so we are sorry to say we don't support distributed training and this is also not in our roadmap.

I'll close the issue now and feel free to reopen if you have other questions!

aishwarya-agrawal commented 4 years ago

Thanks for all the help!

aishwarya-agrawal commented 4 years ago

Is it possible to train NER without pretrained embedding ? I referred to the code base, but there it seems to pass pretrained embedding compulsorily.

yuhui-zh15 commented 4 years ago

I guess here you mean pretrained word embeddings. It is super easy to download the pretrained word embeddings (we also provide script!) and improve the model performance. Why not?

aishwarya-agrawal commented 4 years ago

Thing is we have tried with multiple different word embeddings and checked for model performance. Seems to have similar performance across all. Now I need to check if model is performing the same without any pretrained word embeddings.

yuhui-zh15 commented 4 years ago

Thing is we have tried with multiple different word embeddings and checked for model performance. Seems to have similar performance across all.

Yes, this is normal, if all the embeddings you tried are trained on general NLP corpora.

Now I need to check if model is performing the same without any pretrained word embeddings.

The easiest way is to save the randomly initialized embeddings in the same format. Otherwise, I'm afraid you need to modify the code :)

aishwarya-agrawal commented 4 years ago

oh okay, that wouldn't do any good I suppose. Will have to rethink about the experiment. Anyway thanks for helping out and resolving the queries!

aishwarya-agrawal commented 4 years ago

Hi @yuhui-zh15 , to train with pretrained contextualized char embedding, what all parameters are needed to be passed to ner_tagger.py. I am passing charlm_shorthand, charlm_save_dir, charlm. But it keeps giving error at input_tranform