Evaluation metrics is only perplexity?

mit-han-lab / offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model

https://arxiv.org/abs/2302.04870

MIT License

367 stars 39 forks source link

Evaluation metrics is only perplexity? #8

Closed Smu-Tan closed 1 year ago

Smu-Tan commented 1 year ago

Hi!

Thanks for releasing the code. I have one question about the evaluation. It seems in the current version of the code, you only evaluate perplexity? For example, I think Table 1 of the paper, its metric should be Accuracy for most QA tasks? It seems current eval_harness.py only considers ppl.

Smu-Tan commented 1 year ago

never mind, I found its in the lm_eval