open-compass / MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
Apache License 2.0
763 stars 81 forks source link

MMLU Performance? #13

Closed bdytx5 closed 9 months ago

bdytx5 commented 9 months ago

Was able to use this implementation to benchmark on MMLU.

The scores were significantly lower..

Was wondering if you had any thoughts or suggestion about the evaluation script I used (see eval_mixtral) scored 63.9% which is higher than mistral 7B but lower than the Mixtral score of 70.9

https://github.com/bdytx5/mixtral8x7b_MMLU/blob/main/test/MixtralKit/tools/eval_mmlu.py

Thanks, Brett

bdytx5 commented 9 months ago

ah looks like my temperature may have been at 1 instead of zero.... So far its looking much better