thunlp / OpenMatch

An Open-Source Package for Information Retrieval.
MIT License
447 stars 42 forks source link

如何处理中文数据? #48

Closed Berlin-98 closed 2 years ago

Berlin-98 commented 2 years ago

请问如何处理中文数据集?将om.data.tokenizers.WordTokenizer(pretrained="xxx")替换成中文的词向量后会出现报错:ValueError: could not convert string to float: '义'

Yu-Shi commented 2 years ago

暂时还不支持中文数据,请等待版本更新