issues
search
xiangking
/
ark-nlp
A private nlp coding package, which quickly implements the SOTA solutions.
Apache License 2.0
311
stars
64
forks
source link
更改TransfomerTokenizer对未登录词的处理
#58
Closed
xiangking
closed
2 years ago
xiangking
commented
2 years ago
PR types
Fix
PR changes
修复Tokenizer
Description
新增WordpieceTokenizer类
将transformers库中的WordpieceTokenizer对不存在词典中的词会将其整体视为unk_token的操作改为按字或字母视为unk_token,Closes #57
PR types
Fix
PR changes
修复Tokenizer
Description