RuntimeError: Already borrowed

use_fast = False is not really a viable option, because it doesn't implement return_offsets_mapping. Parsing operates over words, while pre-trained use subwords with a bunch of unicode substitution/normalization rules. The parser relies on having the tokenizer provide a mapping between subwords and character positions in the original string. "Slow" huggingface tokenizers don't implement this feature, and trying to reconstruct alignments after-the-fact is extremely error-prone due to all of the text normalization involved.

If you're using T5-based English parsers and want a solution just for yourself, you can probably modify the tokenization code to use the original sentencepiece library instead of huggingface. But I don't plan on adding such a solution to this repository, because it's not general-purpose and only works for a limited set of pre-trained models. You could also try hacking retokenization.py to have multiple tokenizer copies in thread-local storage.

nikitakit / self-attentive-parser

RuntimeError: Already borrowed #82