pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 815 forks source link

Optionally ignore utf-8 decoding error for scripted C++ tokenizers. #2128

Closed shuminghu closed 1 year ago

shuminghu commented 1 year ago

Summary: Binding and test to make sure we can use 'ignore' option for utf-8 decoding added to pytorch in D43970697( https://github.com/pytorch/pytorch/pull/97282).

Reviewed By: Nayef211

Differential Revision: D44315169

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44315169

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44315169

Nayef211 commented 1 year ago

Don't need to export diffs to GH