Closed ben-davidson-6 closed 3 years ago
Hello, thanks you for opening this issue! Do you want to open a PR with your fix?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.9.2Who can help
@LysandreJik
Information
The xlmr tokenizer is not really picklable, in that it depends on things on disk to be unpickled. This causes issues if you want to use tokenizers in a spark udf, which will pickle the tokenizer, and send it to other nodes to execute, as these other nodes will not have the same things on disk.
The only tokenizer I know this happens with is XLMRobertaTokenizer but I imagine there may be more.
To reproduce
Expected behavior
The expected behaviour would be that once the tokenizer is pickled and I have the prerequisite libraries, I should be able to unpickle it regardless of what is on disk and where.