huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
695 stars 81 forks source link

Dataset loading issue for german_rag_evals on Windows #211

Open Pommel4711 opened 3 months ago

Pommel4711 commented 3 months ago

Hello, I don't know what I'm doing wrong. I received the following error as indicated in the title.

My input was as shown on this website: : Hugging Face - Ger-RAG-eval.

python run_evals_accelerate.py ^
  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^
  --tasks "./examples/tasks/all_german_rag_evals.txt" ^
  --override_batch_size 1 ^
  --use_chat_template ^
  --custom_tasks "community_tasks/german_rag_evals.py" ^
  --output_dir "./evals/"

The output was as follows:

INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0]), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.010932]
WARNING:lighteval.logging.hierarchical_logger:  Creating model configuration {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:31<00:00, 10.60s/it]
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cpu
WARNING:lighteval.logging.hierarchical_logger:    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size='13.49 GB')
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:33.371562]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:01.405496]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:34.806011]
Traceback (most recent call last):
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 82, in <module>
    main(args)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 83, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 135, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 170, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

I discovered that the argument trust_remote_code=True must be passed as part of the model_args parameter. To fix the issue, I tried the following code, but unfortunately, the error persisted.

python run_evals_accelerate.py ^
--model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1,trust_remote_code=True" ^
--tasks "./examples/tasks/all_german_rag_evals.txt" ^
--override_batch_size 1 ^
--use_chat_template ^
--custom_tasks "community_tasks/german_rag_evals.py" ^
--output_dir "./evals/"

Maybe this can help.

When I entered the command accelerate env, I received the following output:

Copy-and-paste the text below in your GitHub issue

clefourrier commented 3 months ago

Hi! The ŧrust_remote_code=True message that you get is for the dataset loading, not the dataset. @PhilipMay, iirc you were the one who added this dataset, can you change it so it does not require trust_remote_code=True ?

PhilipMay commented 3 months ago

Yes I can do that @clefourrier . The problem is that I see no reason why the code thinks it needs to execute custom code to load the dataset. Everything is "just parquet"...

@Pommel4711 here is the command how I use the evaluation: https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval#usage

It works (worked) without the ŧrust_remote_code for me.

PhilipMay commented 3 months ago

Here is a Colab with code that shows that the dataset can be loaded without setting ŧrust_remote_code: https://colab.research.google.com/drive/1BUORL2_VxORGdIko6SMPqJqZIMUmtR-3?usp=sharing

clefourrier commented 3 months ago

Interesting, thanks a lot!

PhilipMay commented 3 months ago

@clefourrier and @Pommel4711 I think the root issue is this and not the dataset itself:

File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Can you please check that?

Pommel4711 commented 3 months ago

@PhilipMay Hey, I'm running this on Windows. Do you use Linux, or do you know how I can fix this problem? I came across this Stack Overflow post that might be related: Python Standard Lib Signal AttributeError: module 'signal' has no attribute 'SIGALRM'.

For reference, I'm running on this commit: a98210fd3a2d1e8bface1c32b.

Thanks for your help!

clefourrier commented 3 months ago

Hm, I'm going to ping @lhoestq on this then because it seems like a datasets issue. Good job seeing this @PhilipMay !

PhilipMay commented 3 months ago

Hm, I'm going to ping @lhoestq on this then because it seems like a datasets issue.

Good idea. Thanks.

lhoestq commented 2 months ago

OSes that don't support SIGALRM are supported thanks to a try/except - not sure how you managed to get the error related to SIGALRM ? (see https://github.com/huggingface/datasets/blob/689447f8c86f777829a4db9ccc5d8133c12ec84c/src/datasets/load.py#L113-L134)

Anyway feel free to update datasets and try again just in case

clefourrier commented 2 months ago

No problem for the transfer if needed

Pommel4711 commented 2 months ago

I coppied the dataset code from this url ans now i get this error

  (lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^  --tasks "./examples/tasks/all_german_rag_evals.txt" ^  --override_batch_size 1 ^  --use_chat_template ^  --custom_tasks "community_tasks/german_rag_evals.py" ^  --output_dir "./evals/"
  Traceback (most recent call last):
    File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
      from lighteval.main_accelerate import CACHE_DIR, main
    File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
      from lighteval.evaluator import evaluate, make_results_table
    File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
      from lighteval.logging.evaluation_tracker import EvaluationTracker
    File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 32, in <module>
      from datasets import Dataset, load_dataset
    File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\__init__.py", line 26, in <module>
      from .inspect import (
    File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\inspect.py", line 32, in <module>
      from .load import (
  ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)
clefourrier commented 2 months ago

Hi @Pommel4711 , Did you try to update datasets first as @lhoestq suggested?

Pommel4711 commented 2 months ago

Hi @Pommel4711 , Did you try to update datasets first as @lhoestq suggested?

Yes, I did update datasets as @lhoestq suggested.

OSes that don't support SIGALRM are supported thanks to a try/except - not sure how you managed to get the error related to SIGALRM ? (see https://github.com/huggingface/datasets/blob/689447f8c86f777829a4db9ccc5d8133c12ec84c/src/datasets/load.py#L113-L134)

Anyway feel free to update datasets and try again just in case

Despite updating the records I get a new error. Any further suggestions would be greatly appreciated.

Thank you!

clefourrier commented 2 months ago

Just to be sure, how did you update the package, and what is the current version you are running?

Pommel4711 commented 2 months ago

Issue with lighteval Evaluation Script

Description

I completely removed the Conda environment lighteval and updated the repository using the following command:

git pull
git checkout main

Checked out the main branch (commit ID = 4651531e4716911f99).

Then, I reinstalled the environment as follows:

conda create -n lighteval python=3.10 && conda activate lighteval
pip install .
pip install '.[accelerate,quantization,adapters]'

After that, I ran the evaluation script:

python run_evals_accelerate.py ^
  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^
  --tasks "./examples/tasks/all_german_rag_evals.txt" ^
  --override_batch_size 1 ^
  --use_chat_template ^
  --custom_tasks "community_tasks/german_rag_evals.py" ^
  --output_dir "./evals/"

I encountered the following error:

File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
    from lighteval.main_accelerate import CACHE_DIR, main
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
    from lighteval.evaluator import evaluate, make_results_table
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
    from lighteval.logging.evaluation_tracker import EvaluationTracker
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 37, in <module>
    from lighteval.logging.info_loggers import (
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\info_loggers.py", line 34, in <module>
    from lighteval.metrics import MetricCategory
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\__init__.py", line 25, in <module>
    from lighteval.metrics.metrics import MetricCategory, Metrics
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics.py", line 75, in <module>
    class Metrics(Enum):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics.py", line 235, in Metrics
    sample_level_fn=JudgeLLM(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics_sample.py", line 634, in __init__
    self.judge = JudgeOpenAI(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\llm_as_judge.py", line 80, in __init__
    with open(templates_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\apps\\entwicklungsumgebung\\anaconda3\\envs\\lighteval\\lib\\site-packages\\lighteval\\metrics\\judge_prompts.jsonl'

To resolve this, I downloaded judge_prompts.jsonl from this link and placed it in the directory where the error occurred.

I ran the script again, which resulted in the following output:

INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0]), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.006101]
WARNING:lighteval.logging.hierarchical_logger:  Creating model configuration {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:01<00:00, 20.60s/it]
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cpu
WARNING:lighteval.logging.hierarchical_logger:    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size='13.49 GB')
WARNING:lighteval.logging.hierarchical_logger:  } [0:01:04.212504]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.061989]
WARNING:lighteval.logging.hierarchical_logger:} [0:01:04.289455]
Traceback (most recent call last):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 89, in <module>
    main(args)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 91, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 133, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 168, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument trust_remote_code=True to allow custom code to be run.

I deleted the dataset and replaced it with this version.

Upon running the script again, I encountered this error:

File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
    from lighteval.main_accelerate import CACHE_DIR, main
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
    from lighteval.evaluator import evaluate, make_results_table
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
    from lighteval.logging.evaluation_tracker import EvaluationTracker
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 32, in <module>
    from datasets import Dataset, load_dataset
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\__init__.py", line 26, in <module>
    from .inspect import (
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\inspect.py", line 32, in <module>
    from .load import (
ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)

This clearly outlines the steps you took, the errors you encountered, and the troubleshooting steps i followed.

clefourrier commented 2 months ago

Thanks a lot for the detailed steps! I think you should instead just do pip install -U datasets to upgrade datasets instead of manually editing files.

Pommel4711 commented 2 months ago

I tried running pip install -U datasets to upgrade datasets as you suggested, instead of manually editing the files. Unfortunately, this error still persists.

ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)

Do you have any other suggestions on how to resolve this issue?

Thank you!

clefourrier commented 2 months ago

cc @lhoestq this sounds like a datasets issue, you can transfer the issue to your lib if needed :)

NathanHB commented 2 months ago

I was unable to reproduce the issue even following the steps. I think it is indeed a datasets issue. I am however going to fix the missing file issue :)

Pommel4711 commented 2 months ago

Maybe i found the problem with the dataset.

I followed the steps mentioned in this comment to resolve the issue without deleting the file C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py and replacing it with the version from this link.

Instead, I tried upgrading the datasets library using the following command:

pip install -U datasets

However, after the upgrade, I noticed that the load.py file remains unchanged and is not the same as the one from this link.

grafik

But than i remain with this error

(lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^  --tasks "./examples/tasks/all_german_rag_evals.txt" ^  --override_batch_size 1 ^  --use_chat_template ^  --custom_tasks "community_tasks/german_rag_evals.py" ^  --output_dir "./evals/"
Using either accelerate or text-generation to run this script is advised.
main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
  Test all gather {
    Not running in a parallel setup, nothing to test
  } [0:00:00.001000]
  Creating model configuration {
  } [0:00:00]
  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    Tokenizer truncation and padding size set to the left side.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:21<00:00,  7.09s/it]
    Using Data Parallelism, putting model on device cpu
    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size=-1)
  } [0:00:23.565683]
  Tasks loading {
  } [0:00:00.061002]
} [0:00:23.641685]
Traceback (most recent call last):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 89, in <module>
    main(args)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 91, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 133, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 168, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
nouf01 commented 1 month ago

Are you trying to run evaluation in offline mode? I got the same error but I am trying offline and I have replace HF links with local location but same trust_remote_code error keeps arising.

Pommel4711 commented 1 month ago

Are you trying to run evaluation in offline mode? I got the same error but I am trying offline and I have replace HF links with local location but same trust_remote_code error keeps arising.

I'm running this always with internet connection. But i don't know the problem. I switched to Linux and it worked

PhilipMay commented 1 month ago

@Pommel4711 now I also have the same issue. I am on linux. So this should not be the root cause of the problem.

PhilipMay commented 1 month ago

@Pommel4711 I found a solution that works for me. See here: #278

It is by adding export HF_DATASETS_TRUST_REMOTE_CODE=TRUE

But this should not be required. IMO this should be considered as a bug in lighteval.

lhoestq commented 1 month ago

can you try uninstalling and reinstalling datasets?

PhilipMay commented 1 month ago

can you try uninstalling and reinstalling datasets?

You mean a pip install -U datasets might not be enough? @lhoestq

lhoestq commented 1 month ago

I double checked and actually the 'SIGALRM' error is not important (just showing for windows users in addition to the trust_remote_code) error which is the actual error.

Anyway there seems to be a dataset called german_rag_evals is a dataset based on a python script that requires remote code to be executed. It is required to pass trust_remote_code=True (or via the environment variable) to access it.

I couldn't find this dataset on HF though, is it a local dataset of yours ?

lhoestq commented 1 month ago

Ah it's community_tasks/german_rag_evals.py apparently ? Well maybe you should point to a dataset on HF with data e.g. in parquet files instead. (and remove this script from lighteval ?)

PhilipMay commented 1 month ago

Ah it's community_tasks/german_rag_evals.py apparently ? Well maybe you should point to a dataset on HF with data e.g. in parquet files instead. (and remove this script from lighteval ?)

I think this is not how lighteval is supposed to work. What do you think @clefourrier ? What I did is written here: #278

lhoestq commented 1 month ago

german_rag_evals.py is not a dataset script actually, datasets can't read it.

So it looks like lighteval uses datasets' dataset_module_factory() function to open this file, maybe lighteval should have its own function to do that

PhilipMay commented 1 month ago

german_rag_evals.py is not a dataset script actually, datasets can't read it.

So it looks like lighteval uses datasets' dataset_module_factory() function to open this file, maybe lighteval should have its own function to do that

This may be the case and may be the cause of this issue. @clefourrier

PhilipMay commented 3 weeks ago

@NathanHB we have new insights into this issue - see comments from me above. Can you please have a look?

clefourrier commented 2 weeks ago

So it looks like lighteval uses datasets' dataset_module_factory() function to open this file, maybe lighteval should have its own function to do that

Interesting, I'll take a look this week