[FT] Support llama.cpp inference

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

MIT License

827 stars 99 forks source link

Open JoelNiklaus opened 3 days ago

JoelNiklaus commented 3 days ago

Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.

Llama.cpp does support mps and would significantly speed up local evaluation of open models.

Allowing the use of the mps device in other ways of loading models would also work.

clefourrier commented 3 days ago

Hi! Feel free to open a PR for this if you need it fast as our roadmap for EOY is full :)

JoelNiklaus commented 3 days ago

Sounds good. Might do at some point, for now it is not a priority for me.

julien-c commented 3 days ago

would be an awesome feature IMO! cc @gary149