Open Lyuji282 opened 5 years ago
Thank you for your PR.
This code is optimized for the "BERT日本語Pretrainedモデル". It is trained with Juman++ and not supposed to use other tokenizers.
I also think it isn't a good idea to use is_tokenized
param because text
augument is originally string typed but it needs to be list type when is_tokenized
is True.
Thank you for replying me. Umm I know that the type of tokenizer is fixed on a BERT pretrained model. I just want to separate tokenization and model-applying server. Certainly, double types of argument is not a good idea, however, list argument is better as like bert-as-service.
A developer wants to separate tokenizing processes and getting embeddings, thus I implement is_tokenized flag.