Closed JINO-ROHIT closed 2 days ago
Thanks for offering to create a notebook to show LM evaluation harness applied to a PEFT model. Examples like this are always welcome.
Just for my understanding, as I don't have experience with this package: What would be involved in this? Skimming their README, I could see that they already provide a script to evaluation HF models:
Probably that should also work with local models, not just models hosted on the Hub. Would this already be sufficient to evaluate PEFT models or are more steps needed, which would be useful to have in a notebook?
yeap, so its more or less the same.
evaluating with peft is here in the advanced usage section - https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#advanced-usage-tips
I thought itd be nice to have an example notebook with the workflow -
evalauting using lm-harness is quite controlled and stable and maintains consistent results. I wasnt able to find this type of workflow over the internet for peft models. If this workflow sits within the purview of the repo, we can have this. WDYT
I see, thanks for providing more details. I agree that this would be a great addition to have in PEFT.
cool, will start working on this notebook
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Resolved via #2190.
Feature request
I would like to make an example notebook for evaluating the peft model for reproducable tasks and metrics using the lm-eval harness if possible .
Library here - https://github.com/EleutherAI/lm-evaluation-harness
Motivation
Evaluating LLMs often involves benchmark datasets, but minor implementation details can significantly affect results, making it difficult to compare outcomes across different codebases. This repo puts together a standardized method of evaluating models but i found very limited resources on how to apply across peft models and lacks in documentation.
Your contribution
Im happy to raise a PR for an example notebook.