Closed Isa-rentacs closed 4 years ago
Hi, @Isa-rentacs Thank you for using nagisa.
- The prefix of the files (nagisa_v002) is different from the actual files (nagisa_v001). Is this just a matter of the filename?
Yes! This is a matter of the filename. These files are same.
- It says it used BCCWJ as the source data. I believe it's this BCCWJ (https://pj.ninjal.ac.jp/corpus_center/bccwj/), but would like to confirm this is the case.
- If the answer for the previous question is yes, could you share more about the training data such as the # of lines, word unit (short/long)?
That's right. I used the core data of BCCWJ as the source data for training the nagisa's model. I used ClassA-1.list to extract the evaluation data. Others were used as the training data and development data. The word unit of these datasets is the short unit.
Please refer to the following link. http://www.ar.media.kyoto-u.ac.jp/mori/research/topics/PST/NextNLP.html
Thank you!
Thank you for your quick reply, appreciated. Closing this issue as I don't have more questions, thanks!
I have questions about the hyper parameters and corpus used to train the built-in model.
When I execute code below:
I get
Here I have 3 questions:
nagisa_v002
) is different from the actual files (nagisa_v001
). Is this just a matter of the filename?