the tokenizer call in process inputs of the text gen pipeline has been running into a race condition in the tokenizer source code when receiving multiple concurrent requests. the issue is due to a tokenizer pass changing the tokenizers state which leads to a conflict when multiple threads try to update the tokenizer.
per @SageMoore this happens consistently on certain machines but at different points in the run
given the tokenization step is relatively fast, we will first try to lock the tokenizer call, if this becomes a bottleneck, we can look into keeping multiple tokenizers and look into potentially avoiding updating the state
error snippet
File "/home/sage/git/wand/deepsparse/src/deepsparse/pipeline.py", line 242, in run_async
await outputs
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/sage/git/wand/deepsparse/src/deepsparse/operators/operator.py", line 98, in __call__
run_output = self.run(
File "/home/sage/git/wand/deepsparse/src/deepsparse/transformers/pipelines/text_generation/process_inputs.py", line 84, in run
input_tokens = self.tokenizer(
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2802, in __call__
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2908, in _call_one
return self.encode_plus(
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2981, in encode_plus
return self._encode_plus(
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
batched_output = self._batch_encode_plus(
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 496, in _batch_encode_plus
self.set_truncation_and_padding(
File "/home/sage/fifth-local-deepsparse/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 467, in set_truncation_and_padding
self._tokenizer.enable_padding(**target)
RuntimeError: Already borrowed
the tokenizer call in process inputs of the text gen pipeline has been running into a race condition in the tokenizer source code when receiving multiple concurrent requests. the issue is due to a tokenizer pass changing the tokenizers state which leads to a conflict when multiple threads try to update the tokenizer.
per @SageMoore this happens consistently on certain machines but at different points in the run
given the tokenization step is relatively fast, we will first try to lock the tokenizer call, if this becomes a bottleneck, we can look into keeping multiple tokenizers and look into potentially avoiding updating the state
error snippet