Closed ccmaymay closed 1 year ago
For compute capability >= 7 (V100, A100, etc.), optimizing for a specific hardware configuration. Suggests 2-4x speedup:
https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/
subsumed by #97
For compute capability >= 7 (V100, A100, etc.), optimizing for a specific hardware configuration. Suggests 2-4x speedup:
https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/