huggingface / optimum-nvidia

Apache License 2.0
867 stars 86 forks source link

Let's make sure to use the repeated heads tensor when in a non-mha scenario #48

Closed mfuntowicz closed 8 months ago