Closed kissge closed 4 years ago
@kissge Sorry for late reply. I've been changing the NE training data-set from KWDLC to GSK2014-A with BCCWJ.
We used Language.update()
of spaCy API for the NE spans of KWDLC to train ner model independently until GiNZA v2.2.1. (The parser model is trained before the ner.)
https://spacy.io/api/language#update
https://github.com/megagonlabs/ginza/blob/6a667efca0edc7c628402c53f5c61742c0739ed0/shell/train_ner.sh
https://github.com/megagonlabs/ginza/blob/6a667efca0edc7c628402c53f5c61742c0739ed0/ginza_util/train_ner.py
From the next release of GiNZA, we'd use spacy train
command with json formatted gold data for ner training. We're using the UD_Japanese-BCCWJ aligned with GSK2014-A. Please see this branch if you want to understand this process.
https://github.com/megagonlabs/ginza/blob/change_ner_corpus_to_gsk2014a/ginza_util/gsk2014a.py
I'd like to ask something about the 'ja_ginza' model provided from this repo. Currently it contains a pretrained NER model, but I couldn't find documents mentioning how, and/or on what documents, it was trained. Where can I find one?
Thanks.