Openllm fix - Githubissues

It took me a while but it fixes an old issue with the MMLU dataset. It also replaces vllm with accelerate to be more consistent with the Open LLM Leaderboard's results. It doesn't use the same version of lm-evaluation-harness for convenience, but results look very close. For instance:

Model	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K	Average
pythia-70m	22.18	27.39	25.29	46.84	51.07	0.23	28.83

Compared with https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Know issue: the tables summarizing the results are poorly formatted. I don't think it's too important so I'll hopefully fix it later, I made several inconclusive attempts.

mlabonne / llm-autoeval

Openllm fix #24