Fix HIP on recent PyTorch version

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

MIT License

2.67k stars 214 forks source link

Closed ardfork closed 11 months ago

ardfork commented 11 months ago

With recent version of PyTorch, it now correctly convert to hipBLAS instead of rocBLAS.

But they do not convert half to hipblasHalf when needed, this patch fix that.