huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
652 stars 73 forks source link

[BUG] community_tasks not working or example is broken #277

Open PhilipMay opened 3 weeks ago

PhilipMay commented 3 weeks ago

Describe the bug

On the README page is an example to run community_tasks. It is:

lighteval accelerate \
    --model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
    --use_chat_template \ # optional, if you want to run the evaluation with the chat template
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"

When I execute this exact command (slightly modified with smaller LLM):

lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"

I get this error:

~/code/git/lighteval$ lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(subcommand='accelerate', model_config_path=None, model_args='pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0', max_samples=None, override_batch_size=-1, job_id='', output_dir='./evals', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/arabic_evals', tasks='community|arabic_mmlu:abstract_algebra|5|1', cache_dir=None, num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0], device='cuda:0'), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.228144]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cuda
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:01.475935]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.000243]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:01.706154]
Traceback (most recent call last):
  File "/users/philip/miniconda3/envs/lighteval-git/bin/lighteval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/__main__.py", line 58, in cli_evaluate
    main_accelerate(args)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
    return fn(*args, **kwargs)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/main_accelerate.py", line 78, in main
    pipeline = Pipeline(
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/pipeline.py", line 126, in __init__
    self._init_tasks_and_requests(tasks=tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/pipeline.py", line 175, in _init_tasks_and_requests
    _, tasks_groups_dict = get_custom_tasks(custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/tasks/registry.py", line 195, in get_custom_tasks
    custom_tasks_module = create_custom_tasks_module(custom_tasks=custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/tasks/registry.py", line 185, in create_custom_tasks_module
    return importlib.import_module(custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'community_tasks/arabic_evals'

To Reproduce

see above, I have the same issues with "my" german_rag_evals.txt tests. Can you please check and provide a working example how to use community tasks. Something seems to be broken at the moment.

Expected behavior

the test should start

Version info

Linux, last lighteval commit e6b599a1448a8b06141cb4f678866ae15b0c5863 from Aug 21, 2024

PhilipMay commented 3 weeks ago

This command seems to work:

lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0,trust_remote_code=True" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals.py" \
    --output_dir "./evals"

I added the missing .py and had to add ,trust_remote_code=True. So it is just a small issue in the example.

NathanHB commented 3 days ago

Thanks for the issue, should be fixed in #300 :)