Closed NielsRogge closed 4 years ago
I am also looking forward to your NER fine-tuned model. Do you have an ETA when something like this will be made available?
Also interested :)
(Sorry for the late response, I did not receive/notice Github notifications.)
I will try to test and release fine-tuned models before the weekend, but I can not make promises. I will only release models that I consider to be useful/trustworthy in practice, but this is not a problem for NER. For instance for SRL I will have to verify tagsets and annotations (the source SoNaR annotations may sometimes be a bit dubious).
Took a bit longer than I intended, but I uploaded the fine-tuned NER models based on BERTje and mBERT. I linked to it in the readme.
I may add more details and usage instructions later when I get to it. But usage should be straightforward if you are familiar with Huggingface Transformers. The three source datasets and tagsets are quite different from each other, so I cannot give a single recommendation.
To give a quick overview, these are the data sizes and tagset sizes of the training data:
Thanks alot!
Hello, I'd like to fine-tune BERTje for custom named-entity recognition in Dutch (for example, to recognize street names). Is this possible by initializing BertForTokenClassification with 'bert-base-dutch-cased'? And also, do you think this approach is viable? How many annotated training examples would be roughly needed to obtain a reasonable performance? Is this approach possible with 200 annotated sentences for every entity type?
Ideally, the BERTje model fine-tuned on CoNLL-2002/SoNaR-1 would be even better in terms of transfer learning. But I see you're planning to release these fine-tuned models in the future, so looking forward to that.