huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
827 stars 99 forks source link

[FT] Support llama.cpp inference #402

Open JoelNiklaus opened 3 days ago

JoelNiklaus commented 3 days ago

Issue encountered

Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.

Solution/Feature

Llama.cpp does support mps and would significantly speed up local evaluation of open models.

Posssible alternatives

Allowing the use of the mps device in other ways of loading models would also work.

clefourrier commented 3 days ago

Hi! Feel free to open a PR for this if you need it fast as our roadmap for EOY is full :)

JoelNiklaus commented 3 days ago

Sounds good. Might do at some point, for now it is not a priority for me.

julien-c commented 3 days ago

would be an awesome feature IMO! cc @gary149