Closed yuyang-huang closed 3 years ago
There's a small mismatch between MosesDetruecaser and the original Perl script:
MosesDetruecaser
$ echo 'COVID @-@ 19' | perl detruecase.perl COVID @-@ 19 $ echo 'COVID @-@ 19' | sacremoses detruecase Covid @-@ 19
It's because that str.capitalize() capitalizes the first character and lowercase the rest.
str.capitalize()
This PR changes token.capitalize() to token[:1].upper() + token[1:] and adds a unit test for it.
token.capitalize()
token[:1].upper() + token[1:]
@yuyang-huang indeed a bug, @alvations can you please merge it? Its hurts...
Thank you @yuyang-huang !
There's a small mismatch between
MosesDetruecaser
and the original Perl script:It's because that
str.capitalize()
capitalizes the first character and lowercase the rest.This PR changes
token.capitalize()
totoken[:1].upper() + token[1:]
and adds a unit test for it.