pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.58k stars 508 forks source link

Add mixtral support #57

Open Chillee opened 9 months ago

Chillee commented 9 months ago

Some preliminary perf numbers:

TP=8, fp16, 163.69 tok/s

siriusctrl commented 9 months ago

@Chillee Hi there, did you get the fp8 version somewhere or you are currently working on the fp8 quant in this PR?