Improve tokenizer pretty-pretty logic + __call__ method

pytorch / captum

Model interpretability and understanding for PyTorch

https://captum.ai

BSD 3-Clause "New" or "Revised" License

4.95k stars 499 forks source link

Improve tokenizer pretty-pretty logic + call method #1417

Closed craymichael closed 3 weeks ago

craymichael commented 1 month ago

Summary: Use the call method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64998804

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64998804

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64998804

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64998804

facebook-github-bot commented 3 weeks ago

This pull request has been merged in pytorch/captum@ad89e0bb2bb4fa061d576ff3bfc387aa3ea5939d.

pytorch / captum

Improve tokenizer pretty-pretty logic + __call__ method #1417

Improve tokenizer pretty-pretty logic + call method #1417