Open MrD005 opened 6 months ago
how to use speculative decoding? is there any document for understanding it better?
added support in recent update for both tensorRT llm and TensorRT llm backend
We're working on an example w/ docs now - there is an implementation you can reference here
how to use speculative decoding? is there any document for understanding it better?
added support in recent update for both tensorRT llm and TensorRT llm backend