Closed krzysiekpodk closed 4 months ago
@krzysiekpodk Evaluating a 160B model at FP16 is beyond my available resources as a hobbyist 😢 but I can blow some of this month's cloud GPU budget on trying the 4-bit quants - any preferences?
@krzysiekpodk Results are somewhat mixed:
Original Mixtral-Instruct-8x22B performs quite well no matter the quant: AWQ GPTQ and EXL2 are all within spitting distance of each other.
WizardLM2 8x22B AWQ appears to be broken and scores poorly. I could not find a GPTQ. EXL2 performs well.
thank you!! This is interesting as it looks like good scores are still not consistent across different benchmarks for oss models, maybe its time for leaderboard of leaderborads? :D
hey,
Not sure if you have seen: https://prollm.toqan.ai/leaderboard
In my opinion the most interesting takeaway is if you filter by advanced in code-recent category (unseen in training) the only model that rival propetiary models with that selection is wizardlm-2 8x22b
It would be really interesting to see if your benchmark will also score it so high