Closed shuminghu closed 1 year ago
This pull request was exported from Phabricator. Differential Revision: D43970697
This pull request was exported from Phabricator. Differential Revision: D43970697
This pull request was exported from Phabricator. Differential Revision: D43970697
This pull request was exported from Phabricator. Differential Revision: D43970697
This pull request was exported from Phabricator. Differential Revision: D43970697
Summary: When language models use c++ tokenizer, outputs are a c++ strings that are not necessarily valid utf-8 encodings. Default pybind11 casting uses strict utf-8 decoding. We relax the decoding using 'ignore' argument.
Reviewed By: Nayef211
Differential Revision: D43970697