Why is `mmlu` benchmark commented out?

sugatoray commented 5 months ago

I was going over the runpod.sh script and couldn't help but notice mmlu benchmark is commented out. I'm curious, why is that so.

Thank you for putting this repository together @mlabonne. Learned about this from your X-post 🚀!!

https://github.com/mlabonne/llm-autoeval/blob/75d952ed8062123285f2733644f110e5b5743eda/runpod.sh#L86-L94

mlabonne commented 5 months ago

Thanks @sugatoray!

It's a little embarrassing, but I realized the MMLU doesn't work reliably after publishing this repo. Unfortunately, it looks like it's an lm-evaluation-harness issue and I can't do much about it.

I think the best workaround is simply not using vllm. I'll rename this benchmark suite "openllm-vllm" and create another one that doesn't rely on it.

sugatoray commented 5 months ago

@mlabonne Thank you for explaining.

On a separate note, do you have any repositories for LLM training/finetuning with Apple Silicon? Recently I purchased a MacBook Pro Max with an M3 -- 128GB RAM and maxed out other specs. Inference is going pretty fast ggml formats. But I am more interested in using it for more hardcore training/experimentation purposes. Thanks anyway. :)

Some references I have:

https://github.com/ml-explore/mlx-examples
mlx-ml library on PyPI

mlabonne commented 5 months ago

Not really. There are a few of them but I don't have Apple Silicon, so I couldn't try them by myself.

On Fri, Jan 12, 2024 at 10:20 PM Sugato Ray @.***> wrote:

@mlabonne https://github.com/mlabonne Thank you for explaining.

On a separate note, do you have any repositories for LLM training/finetuning with Apple Silicon? Recently I purchased a MacBook Pro Max with an M3 -- 128GB RAM and maxed out other specs. Inference is going pretty fast ggml formats. But I am more interested in using it for more hardcore training/experimentation purposes. Thanks anyway. :)

— Reply to this email directly, view it on GitHub https://github.com/mlabonne/llm-autoeval/issues/4#issuecomment-1890058039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATL5EGRA75RXHKGP527WLEDYOGZKNAVCNFSM6AAAAABBXOLROCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQGA2TQMBTHE . You are receiving this because you were mentioned.Message ID: @.***>

mlabonne / llm-autoeval

Why is `mmlu` benchmark commented out? #4