from sacremoses import MosesTokenizer
print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
I got the following error. And I found changing lang='en' to lang='zh' doesn't solve the problem.
Traceback (most recent call last):
File ".../scratches/test.py", line 3, in <module>
print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
File ".../python3.9/site-packages/sacremoses/tokenize.py", line 423, in penn_tokenize
text = regexp.sub(substitution, text)
AttributeError: 'str' object has no attribute 'sub'
I think the problem is here, since it is a str, not a compiled pattern
I got the following error. And I found changing
lang='en'
tolang='zh'
doesn't solve the problem.I think the problem is here, since it is a
str
, not a compiled patternhttps://github.com/hplt-project/sacremoses/blob/65543c34baf589f30260488d882d0060abaa4087/sacremoses/tokenize.py#L93-L96