Closed tbozhong closed 3 months ago
Unfortunately, we are unable to release the scripts for evaluating specific downstream tasks. For your evaluation needs, please consider using open-source LLM evaluation frameworks or refer to scripts provided by the developers of the respective downstream datasets.
Thanks for your response!
Could you please provide more detailed information regarding the composition of the general dataset? For instance, I would like to know which data ranges, such as indices 1 to 1000, correspond to specific subsets like MMLU.
Apologies for any confusion. The general set is sampled from the retain set of the LLM's pre-training dataset. It is used to evaluate the unlearned model's perplexity on the retain set. It is not associated with any specific downstream task, such as MMLU.
Thanks for your timely response. I have no more questions.
Hi theređź‘‹
I've noticed that currently, there is only a
general
dataset available for evaluating overall performance on downstream tasks. Are there any scripts that evaluate the specific performance of individual downstream tasks?I appreciate your assistance and look forward to your response!