hltcoe / sandle

Run a large language modeling SANDbox in your Local Environment
Other
7 stars 1 forks source link

Triton Inference Server / FasterTransformer #79

Closed ccmaymay closed 1 year ago

ccmaymay commented 1 year ago

For compute capability >= 7 (V100, A100, etc.), optimizing for a specific hardware configuration. Suggests 2-4x speedup:

https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/

ccmaymay commented 1 year ago

subsumed by #97