lonePatient / BERT-NER-Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
MIT License
2.06k stars 424 forks source link

TypeError: __init__() got an unexpected keyword argument 'max_len' #12

Closed possible1402 closed 4 years ago

possible1402 commented 4 years ago

使用作者自定义的CNerTokenizer会报错__init__() got an unexpected keyword argument 'max_len' 具体错误信息如下: ` File "BERT-NER-Pytorch-master/run_ner_softmax.py", line 549, in

main()

File "BERT-NER-Pytorch-master/run_ner_softmax.py", line 480, in main

cache_dir=args.cache_dir if args.cache_dir else None,)

File "BERT-NER-Pytorch-master\models\transformers\tokenization_utils.py", line 282, in from_pretrained

return cls._from_pretrained(*inputs, **kwargs)

File "BERT-NER-Pytorch-master\models\transformers\tokenization_utils.py", line 411, in _from_pretrained

tokenizer = cls(*init_inputs, **init_kwargs)

TypeError: init() got an unexpected keyword argument 'max_len'`

P.S. 使用BertTokenizer不会报错。还想请问下作者为什么要自定义分词器呢?难道BertTokenizer不会将没有在词表中的单词转化为<UNK>吗?

lonePatient commented 4 years ago

@possible1402 图个方便而已,对于中文而言,基本不会不出subword ##情况,所以就自定义了一个,完全基于字符的,当然大部分ner数据都是字符标记的,哪种tokenizer都没问题,但是遇到原始的文本输入的话,Berttokenizer可能会出现对齐不了问题。你可以自己加个token_index的mapping

wc393439231 commented 3 years ago

想麻烦问一下,如果想使用作者的Tokenizer,该如何修复这个bug呢