Triton Inference Server / FasterTransformer

hltcoe / sandle

Run a large language modeling SANDbox in your Local Environment

Other

7 stars 1 forks source link

Closed ccmaymay closed 1 year ago

ccmaymay commented 1 year ago

For compute capability >= 7 (V100, A100, etc.), optimizing for a specific hardware configuration. Suggests 2-4x speedup:

ccmaymay commented 1 year ago

subsumed by #97