Closed ch-shin closed 2 months ago
I think heavy.yml
is missing HumanEval, which is part of extended no? cc @Vaishaal
@ch-shin
heavy.yaml is the correct one to run.
But isn't it missing HumanEval @afang-story ?
@Muennighoff HumanEval can be found in heavy_code.yaml which includes heavy.yaml as well as additional code evaluations.
The error is resolved by updating llm-foundry (0.2.0 --> 0.7.0). But got another error after then.
Map: 0%| | 0/373 [00:00<?, ? examples/s]
Map: 17%|█▋ | 65/373 [00:00<00:00, 636.73 examples/s]
Map: 35%|███▌ | 131/373 [00:00<00:00, 642.71 examples/s]
Map: 53%|█████▎ | 197/373 [00:00<00:00, 642.93 examples/s]
Map: 79%|███████▊ | 293/373 [00:00<00:00, 640.25 examples/s]
Map: 96%|█████████▌| 358/373 [00:00<00:00, 640.76 examples/s]
Map: 100%|██████████| 373/373 [00:00<00:00, 608.93 examples/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ubuntu/research_nfs/dclm/eval/eval_openlm_ckpt.py", line 551, in <module>
[rank0]: main()
[rank0]: File "/home/ubuntu/research_nfs/dclm/eval/eval_openlm_ckpt.py", line 514, in main
[rank0]: icl_results = evaluate(eval_model, tokenizer, eval_cfg)
[rank0]: File "/home/ubuntu/miniconda3/envs/dclm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/ubuntu/research_nfs/dclm/eval/eval_openlm_ckpt.py", line 148, in evaluate
[rank0]: evaluators, logger_keys = build_icl_evaluators(
[rank0]: File "/home/ubuntu/miniconda3/envs/dclm/lib/python3.10/site-packages/llmfoundry/utils/builders.py", line 576, in build_icl_evaluators
[rank0]: _validate_cfg(icl_cfg)
[rank0]: File "/home/ubuntu/miniconda3/envs/dclm/lib/python3.10/site-packages/llmfoundry/utils/builders.py", line 544, in _validate_cfg
[rank0]: raise ValueError(
[rank0]: ValueError: No metric_names defined, unable to build default metrics for icl_task_type=question_answering.
It was fixed by changing icl_task_type=question_answering
--> icl_task_type=generation_task_with_answers
in heavy.yml.
It may also be possible that updating to 0.8.0 will fix this. Anyways, glad that things seem to work now. Marking as closed.
Hi team, I have two questions regarding datacomp evaluation.