There is an error that seems to only happen when generating the second token when streaming=True, but only shows up in specific pipeline setups such as Gradio
Error:
File "/home/user/.local/lib/python3.10/site-packages/deepsparse/transformers/engines/nl_decoder_engine.py", line 194, in __call__
with timer.time(f"EXECUTE_ENGINE_SEQ_LEN_{self.sequence_length}"):
AttributeError: 'NoneType' object has no attribute 'time'
This PR simply uses nullcontext to replace the timer if it is None
There is an error that seems to only happen when generating the second token when streaming=True, but only shows up in specific pipeline setups such as Gradio
Error:
This PR simply uses nullcontext to replace the timer if it is None