Triton does not officially support SM60 or SM61 GPUs anymore. This includes the datacenter P40, P100 and P102 cards the Quadro P5000 and the GTX1080 family.
@the-crypt-keeper thanks for opening an issue! I'll take a look at the limitations.
Since aphrodite seems to work fine, could you test it within Kalavai? Here's a guide to do so (note it does not have to be a GGUF model, you can plug in whatever)
Triton does not officially support SM60 or SM61 GPUs anymore. This includes the datacenter P40, P100 and P102 cards the Quadro P5000 and the GTX1080 family.
https://github.com/triton-lang/triton/issues/2780
Additionally vLLM requires some patches to play nice with P40's SM60 architecture, while aphrodite-engine seems to work OK out of the box.
Patches for Triton and vLLM are available as wheels here: https://github.com/sasha0552/pascal-pkgs-ci