yongzhuo / Pytorch-NLU

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee
https://blog.csdn.net/rensihui
Apache License 2.0
328 stars 52 forks source link

支持英文吗 #6

Closed huajinping closed 1 year ago

yongzhuo commented 1 year ago

可以,改一下数据预处理模块 tcData.py 中的def load_tokenizer(self, config)的函数。把


        tokenizer = PretrainedTokenizer.from_pretrained(config.pretrained_model_name_or_path)

改为

        tokenizer = PRETRAINED_MODEL_CLASSES[config.model_type][1].from_pretrained(config.pretrained_model_name_or_path)

就好

huajinping commented 1 year ago

感谢大佬

huajinping commented 1 year ago

序列标注英文也是同样的改法吗?