Trying to run the evaluation using quantization model/ inference endpoint, because the GPU method causes cuda out of memory. However, I am unable to use the model_config_path in the argument, value error is returned.
To Reproduce
Git clone the project
Pip install . in virtual environment
Modified yaml files:
endpoint_model.yaml (this file is stored in examples/model_configs)
model:
type: "endpoint" # can be base, tgi, or endpoint
base_params:
endpoint_name: "llama-3-storm" # needs to be lower case without special characters
model: "akjindal53244/Llama-3.1-Storm-8B"
revision: "main"
dtype: "bfloat16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
instance:
accelerator: "gpu"
region: "eu-west-1"
vendor: "aws"
instance_size: "medium"
instance_type: "g5.2xlarge"
framework: "pytorch"
endpoint_type: "protected"
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
env_vars:
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
generation:
add_special_tokens: true
quantized_model.yaml (this file is stored in the root folder of lighteval)
2024-09-10 17:14:50.679383: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-10 17:14:50.853234: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-10 17:14:50.918535: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-10 17:14:50.937413: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-10 17:14:51.061512: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-10 17:14:51.917501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(subcommand='accelerate', model_config_path='C:\\Users\\CSOC\\Documents\\lighteval\\lighteval\\examples\\model_configs\\endpoint_model.yaml', model_args=None, max_samples=None, override_batch_size=-1, job_id='', output_dir='output_dir', save_details=False, push_to_hub=False, push_to_tensorboard=False, public_run=False, results_org=None, use_chat_template=False, system_prompt=None, dataset_loading_processes=1, custom_tasks=None, tasks='bigbench|fact_checker|0|0', cache_dir=None, num_fewshot_seeds=1)), {
WARNING:lighteval.logging.hierarchical_logger:} [0:00:00.035451]
Traceback (most recent call last):
File "/home/project/anaconda3/envs/lighteval/bin/lighteval", line 8, in <module>
sys.exit(cli_evaluate())
File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/__main__.py", line 58, in cli_evaluate
main_accelerate(args)
File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
return fn(*args, **kwargs)
File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/main_accelerate.py", line 70, in main
model_config = create_model_config(
File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/models/model_config.py", line 327, in create_model_config
raise ValueError("You can't create a model without either a list of model_args or a model_config_path.")
ValueError: You can't create a model without either a list of model_args or a model_config_path.
Expected behavior
The evaluation will run on inference endpoints/ quantization method.
Describe the bug
Trying to run the evaluation using quantization model/ inference endpoint, because the GPU method causes cuda out of memory. However, I am unable to use the model_config_path in the argument, value error is returned.
To Reproduce
endpoint_model.yaml (this file is stored in examples/model_configs)
quantized_model.yaml (this file is stored in the root folder of lighteval)
inference endpoint method:
quantization method:
Full log:
Expected behavior
The evaluation will run on inference endpoints/ quantization method.
Version info
OS: Ubuntu (WSL) python: 3.10 Cuda: 12.2 Commit: 7261d80