what is the practical speedup ?

hahnyuan / PB-LLM

PB-LLM: Partially Binarized Large Language Models

MIT License

143 stars 10 forks source link

what is the practical speedup ? #5

Open XA23i opened 8 months ago

XA23i commented 8 months ago

interesting work, Since some salient parameters have not been binarized, I am curious about the practical speedup in comparison to floating-point models. Do you utilize some GPU kernel to accelerate inference?