Open JefferyChen453 opened 2 months ago
I've tried adding the param add_special_tokens=True
in config file but the last token is still missing
Using the latest repo (commit_id = 7261d80d5679cd91c5c20cf2a7823f092ff66251), I evaluated the same 5 ckpts again (red line in figure). The results are still below the official results.
And when examining the prepared_batch, the last token still seemed to be missing.
My command:
accelerate launch --num_processes=1 -m \
lighteval accelerate \
--model_args="pretrained=/mnt/data/user/tc_agi/caijie/fineweb_models/ablation-model-fineweb-v1_5000,trust_remote_code=True" \
--override_batch_size 128 \
--custom_tasks "/data/fineweb-pipeline/lighteval-main/lighteval_tasks.py" \
--output_dir "/data/fineweb-pipeline/lighteval-main/evals/" \
--tasks "custom|mmlu:abstract_algebra|0|1"
Thanks for the report, we'll investigate! cc @hynky1999 and @guipenedo for the fineweb aspect
The bug I met is similar to #203. I'm trying to reproduce the evaluation results of ablation model trained on FineWeb, using LightEval of commit_id=a98210fd3a2d1e8bface1c32b72ebd5017173a4c.
The MMLU result of step-5000/10000/15000/19000/24000 (namely, 5 ckpts from the first 50b consumed tokens) are as below:
I don't know what causes this gap, when debugging I discover that:
The last token of the prepared_batch is missing. Does this mean the evaluation results of fineweb blogpost is inaccurate?
But when I delete
[:-1]
in https://github.com/huggingface/lighteval/blob/aaa8bbf705b6f090fb07ad36503f39b5e922a6df/src/lighteval/models/base_model.py#L851The evaluation results became totally random guess for all ckpts. I suppose there are more lines to modify, or something else caused the gaps in my reproduction results.