huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
851 stars 100 forks source link

[BUG] community_tasks not working or example is broken #277

Closed PhilipMay closed 2 months ago

PhilipMay commented 3 months ago

Describe the bug

On the README page is an example to run community_tasks. It is:

lighteval accelerate \
    --model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
    --use_chat_template \ # optional, if you want to run the evaluation with the chat template
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"

When I execute this exact command (slightly modified with smaller LLM):

lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"

I get this error:

~/code/git/lighteval$ lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals" \
    --output_dir "./evals"
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(subcommand='accelerate', model_config_path=None, model_args='pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0', max_samples=None, override_batch_size=-1, job_id='', output_dir='./evals', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/arabic_evals', tasks='community|arabic_mmlu:abstract_algebra|5|1', cache_dir=None, num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0], device='cuda:0'), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.228144]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cuda
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:01.475935]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.000243]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:01.706154]
Traceback (most recent call last):
  File "/users/philip/miniconda3/envs/lighteval-git/bin/lighteval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/__main__.py", line 58, in cli_evaluate
    main_accelerate(args)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
    return fn(*args, **kwargs)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/main_accelerate.py", line 78, in main
    pipeline = Pipeline(
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/pipeline.py", line 126, in __init__
    self._init_tasks_and_requests(tasks=tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/pipeline.py", line 175, in _init_tasks_and_requests
    _, tasks_groups_dict = get_custom_tasks(custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/tasks/registry.py", line 195, in get_custom_tasks
    custom_tasks_module = create_custom_tasks_module(custom_tasks=custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/site-packages/lighteval/tasks/registry.py", line 185, in create_custom_tasks_module
    return importlib.import_module(custom_tasks)
  File "/users/philip/miniconda3/envs/lighteval-git/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'community_tasks/arabic_evals'

To Reproduce

see above, I have the same issues with "my" german_rag_evals.txt tests. Can you please check and provide a working example how to use community tasks. Something seems to be broken at the moment.

Expected behavior

the test should start

Version info

Linux, last lighteval commit e6b599a1448a8b06141cb4f678866ae15b0c5863 from Aug 21, 2024

PhilipMay commented 3 months ago

This command seems to work:

lighteval accelerate \
    --model_args "pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0,trust_remote_code=True" \
    --use_chat_template \
    --tasks "community|arabic_mmlu:abstract_algebra|5|1" \
    --custom_tasks "community_tasks/arabic_evals.py" \
    --output_dir "./evals"

I added the missing .py and had to add ,trust_remote_code=True. So it is just a small issue in the example.

NathanHB commented 2 months ago

Thanks for the issue, should be fixed in #300 :)