Is there an evaluation of non API models? Such as LLama 7B, GPT xl, etc. - Githubissues

tianyi-lab / Superfiltering

[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

104 stars 8 forks source link

Is there an evaluation of non API models? Such as LLama 7B, GPT xl, etc. #2

Closed sev777 closed 5 months ago

sev777 commented 5 months ago

Is there an evaluation of non API models? Such as LLama 7B, GPT xl, etc.

MingLiiii commented 5 months ago

Thanks for your interest in our work but I am not quite sure what you mean.

If you are asking for an evaluation in which no LLM-as-judge is used: the Open LLM leaderboard is the one without using LLM as the judge.
If you are asking using other non-API models as the Judge: I think you can directly change the API call to a normal inference on the non-Api models. However, I don't think using non-API models as the Judge is widely accepted, as their relatively weak capability.

Let me know if you have any further questions!