lonePatient / BERT-NER-Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
MIT License
2.11k stars 427 forks source link

Fix do_lower_case #87

Closed entropy2333 closed 1 year ago

entropy2333 commented 2 years ago

do_lower_case参数用于判断是否对输入文本小写,传递给tokenizer。

tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case,)

参数在tokenizer.tokenize方法中发挥作用,本项目中直接使用了tokenizer.convert_tokens_to_ids方法,实际上并没有起作用,因此需要手动处理。

def convert_examples_to_features(...):
    ...
    if tokenizer.do_lower_case:
        tokens = [x.lower() for x in tokens]
    ...
    input_ids = tokenizer.convert_tokens_to_ids(tokens)