mlabonne / llm-autoeval

Automatically evaluate your LLMs in Google Colab
MIT License
460 stars 77 forks source link

Novel errors when running notebook #29

Open ann-brown opened 2 months ago

ann-brown commented 2 months ago

(Edit: Somehow submitted this before it was ready; give me a moment to pin the bug down.)

Getting some unfamiliar errors running the notebook. Tested lighteval with one task and eq-bench; also tested switching the image to the one suggested in previous image-switching issue; and rerunning a model that worked yesterday.

Container error logs from debug, lighteval:

2024-05-03T22:44:09.217084005Z Traceback (most recent call last):
2024-05-03T22:44:09.217097561Z   File "/lighteval/run_evals_accelerate.py", line 29, in <module>
2024-05-03T22:44:09.217099188Z     from lighteval.main_accelerate import CACHE_DIR, main
2024-05-03T22:44:09.217100293Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/main_accelerate.py", line 31, in <module>
2024-05-03T22:44:09.217112611Z     from lighteval.evaluator import evaluate, make_results_table
2024-05-03T22:44:09.217113371Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/evaluator.py", line 32, in <module>
2024-05-03T22:44:09.217117633Z     from lighteval.logging.evaluation_tracker import EvaluationTracker
2024-05-03T22:44:09.217118437Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/logging/evaluation_tracker.py", line 37, in <module>
2024-05-03T22:44:09.217145402Z     from lighteval.logging.info_loggers import (
2024-05-03T22:44:09.217146831Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/logging/info_loggers.py", line 34, in <module>
2024-05-03T22:44:09.217290883Z     from lighteval.metrics import MetricCategory
2024-05-03T22:44:09.217297198Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/metrics/__init__.py", line 25, in <module>
2024-05-03T22:44:09.217298154Z     from lighteval.metrics.metrics import MetricCategory, Metrics
2024-05-03T22:44:09.217299180Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/metrics/metrics.py", line 34, in <module>
2024-05-03T22:44:09.217299880Z     from lighteval.metrics.metrics_sample import (
2024-05-03T22:44:09.217301038Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/metrics/metrics_sample.py", line 42, in <module>
2024-05-03T22:44:09.217370955Z     from lighteval.metrics.llm_as_judge import JudgeOpenAI
2024-05-03T22:44:09.217374502Z   File "/usr/local/lib/python3.10/dist-packages/lighteval/metrics/llm_as_judge.py", line 30, in <module>
2024-05-03T22:44:09.217375365Z     from openai import OpenAI
2024-05-03T22:44:09.217376370Z ModuleNotFoundError: No module named 'openai'
2024-05-03T22:44:09.724293220Z Traceback (most recent call last):
2024-05-03T22:44:09.724322586Z   File "/usr/local/bin/accelerate", line 8, in <module>
2024-05-03T22:44:09.724335712Z     sys.exit(main())
2024-05-03T22:44:09.724337033Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
2024-05-03T22:44:09.724338223Z     args.func(args)
2024-05-03T22:44:09.724339038Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1082, in launch_command
2024-05-03T22:44:09.724457425Z     simple_launcher(args)
2024-05-03T22:44:09.724458705Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 688, in simple_launcher
2024-05-03T22:44:09.724545203Z     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
2024-05-03T22:44:09.724558533Z subprocess.CalledProcessError: Command '['/usr/bin/python', 'run_evals_accelerate.py', '--model_args', 'pretrained=<model name>, '--use_chat_template', '--tasks', 'helm|commonsenseqa|0|0', '--output_dir=./evals/']' returned non-zero exit status 1.
2024-05-03T22:44:10.119077403Z Traceback (most recent call last):
2024-05-03T22:44:10.119092033Z   File "/lighteval/../llm-autoeval/main.py", line 129, in <module>
2024-05-03T22:44:10.119093509Z     raise ValueError(f"The directory {args.directory} does not exist.")
2024-05-03T22:44:10.119094286Z ValueError: The directory ./evals/results does not exist.

Container error logs from debug, eq-bench:

2024-05-04T00:49:11.418739844+02:00 Traceback (most recent call last):
2024-05-04T00:49:11.418762136+02:00   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-05-04T00:49:11.418772005+02:00     return _run_code(code, main_globals, None,
2024-05-04T00:49:11.418774500+02:00   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2024-05-04T00:49:11.418784579+02:00     exec(code, run_globals)
2024-05-04T00:49:11.418787154+02:00   File "/lm-evaluation-harness/lm_eval/__main__.py", line 401, in <module>
2024-05-04T00:49:11.418867446+02:00     cli_evaluate()
2024-05-04T00:49:11.418877675+02:00   File "/lm-evaluation-harness/lm_eval/__main__.py", line 333, in cli_evaluate
2024-05-04T00:49:11.418913252+02:00     results = evaluator.simple_evaluate(
2024-05-04T00:49:11.418924233+02:00   File "/lm-evaluation-harness/lm_eval/utils.py", line 316, in _wrapper
2024-05-04T00:49:11.418957566+02:00     return fn(*args, **kwargs)
2024-05-04T00:49:11.418964680+02:00   File "/lm-evaluation-harness/lm_eval/evaluator.py", line 258, in simple_evaluate
2024-05-04T00:49:11.419002892+02:00     results = evaluate(
2024-05-04T00:49:11.419008102+02:00   File "/lm-evaluation-harness/lm_eval/utils.py", line 316, in _wrapper
2024-05-04T00:49:11.419049490+02:00     return fn(*args, **kwargs)
2024-05-04T00:49:11.419054069+02:00   File "/lm-evaluation-harness/lm_eval/evaluator.py", line 592, in evaluate
2024-05-04T00:49:11.419127468+02:00     "n-samples": {
2024-05-04T00:49:11.419131345+02:00   File "/lm-evaluation-harness/lm_eval/evaluator.py", line 595, in <dictcomp>
2024-05-04T00:49:11.419181300+02:00     "effective": min(limit, len(task_output.task.eval_docs)),
2024-05-04T00:49:11.419183995+02:00 TypeError: '<' not supported between instances of 'int' and 'NoneType'
2024-05-04T00:49:11.423606509+02:00 Passed argument batch_size = auto. Detecting largest batch size
2024-05-04T00:49:11.423615215+02:00 Determined Largest batch size: 4
2024-05-04T00:49:12.049730263+02:00 Traceback (most recent call last):
2024-05-04T00:49:12.049745021+02:00   File "/usr/local/bin/accelerate", line 8, in <module>
2024-05-04T00:49:12.049811316+02:00     sys.exit(main())
2024-05-04T00:49:12.049814512+02:00   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
2024-05-04T00:49:12.049956972+02:00     args.func(args)
2024-05-04T00:49:12.049966009+02:00   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1082, in launch_command
2024-05-04T00:49:12.050293249+02:00     simple_launcher(args)
2024-05-04T00:49:12.050303087+02:00   File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 688, in simple_launcher
2024-05-04T00:49:12.050515349+02:00     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
2024-05-04T00:49:12.050519176+02:00 subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'lm_eval', '--model', 'hf', '--model_args', 'pretrained=<model name>,dtype=auto,trust_remote_code=False', '--tasks', 'eq_bench', '--num_fewshot', '0', '--batch_size', 'auto', '--output_path', './evals/eq-bench.json']' returned non-zero exit status 1.
2024-05-04T00:49:12.484988425+02:00 Traceback (most recent call last):
2024-05-04T00:49:12.485006620+02:00   File "/lm-evaluation-harness/../llm-autoeval/main.py", line 129, in <module>
2024-05-04T00:49:12.485009545+02:00     raise ValueError(f"The directory {args.directory} does not exist.")
2024-05-04T00:49:12.485011008+02:00 ValueError: The directory ./evals does not exist.