huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
785 stars 92 forks source link

[BUG] Error when using TGI endpoint. #299

Closed Vanessa-Taing closed 1 month ago

Vanessa-Taing commented 1 month ago

Describe the bug

Rate limit error when running the program on TGI endpoint. Wondering whether it is the API problem (error log in self-debugging attempt). And btw, can the inference point be set to spaces?

To Reproduce

  1. Modified the tgi_model.yaml
    model:
    type: "tgi" # can be base, tgi, or endpoint
    instance:
    inference_server_address: "https://api-inference.huggingface.co/models/akjindal53244/Llama-3.1-Storm-8B"
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
  2. Run command:
    lighteval accelerate \
    --model_config_path="/mnt/c/Users/CSOC/Documents/lighteval/lighteval/examples/model_configs/tgi_model.yaml" \
    --tasks "bigbench|fact_checker|0|0" \
    --output_dir output_dir

Full log:

2024-09-11 11:39:43.249055: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-11 11:39:43.410645: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-11 11:39:43.471807: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-11 11:39:43.489384: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-11 11:39:43.608876: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-11 11:39:44.329542: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
/home/deepfake_detection/anaconda3/envs/lighteval/lib/python3.10/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_id" in DeployedModel has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(subcommand='accelerate', model_config_path='/mnt/c/Users/CSOC/Documents/lighteval/lighteval/examples/model_configs/tgi_model.yaml', model_args=None, max_samples=None, override_batch_size=-1, job_id='', output_dir='output_dir', save_details=False, push_to_hub=False, push_to_tensorboard=False, public_run=False, results_org=None, use_chat_template=False, system_prompt=None, dataset_loading_processes=1, custom_tasks=None, tasks='bigbench|fact_checker|0|0', cache_dir=None, num_fewshot_seeds=1)),  { 
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0], device='cuda:0'), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:01.864774]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
WARNING:lighteval.logging.hierarchical_logger:    Load model from inference server: https://api-inference.huggingface.co/models/akjindal53244/Llama-3.1-Storm-8B
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.935378]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:02.834408]
Traceback (most recent call last):
  File "/home/project/anaconda3/envs/lighteval/bin/lighteval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/__main__.py", line 58, in cli_evaluate
    main_accelerate(args)
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
    return fn(*args, **kwargs)
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/main_accelerate.py", line 78, in main
    pipeline = Pipeline(
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/pipeline.py", line 122, in __init__
    self.model = self._init_model(model_config, model)
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/pipeline.py", line 166, in _init_model
    return load_model(config=model_config, env_config=self.pipeline_parameters.env_config)
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/models/model_loader.py", line 76, in load_model
    return load_model_with_tgi(config)
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/models/model_loader.py", line 96, in load_model_with_tgi
    model = ModelClient(
  File "/home/project/anaconda3/envs/lighteval/lib/python3.10/site-packages/lighteval/models/tgi_model.py", line 61, in __init__
    raise ValueError("Error occured when fetching info: " + str(self.model_info))
ValueError: Error occured when fetching info: {'error': 'Rate limit reached. Please log in or use a HF access token'}
  1. Self-debugging attempts:
    • Checked huggingface-cli whoami, returned username, indicating successful log in status.
    • Performing the API call separately, returning the result:
      403 Forbidden: None.
      Cannot access content at: https://api-inference.huggingface.co/models/akjindal53244/Llama-3.1-Storm-8B/v1/chat/completions.
      If you are trying to create or update content, make sure you have a token with the `write` role.
      The model akjindal53244/Llama-3.1-Storm-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference 
      Endpoints (https://huggingface.co/inference-endpoints).

Expected behavior

Evaluation running on inference API.

Version info

OS: Ubuntu (WSL) Python: 3.10 Cuda: 12.2 Commit:c83daef

Vanessa-Taing commented 1 month ago

Update on the log:

ValueError: Error occured when fetching info: {'error': 'The model akjindal53244/Llama-3.1-Storm-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).'}

Got this message after second run, I assume there is a limit to the model size for inference?

NathanHB commented 1 month ago

Hi ! There is a limit but it's set by tgi and not lighteval. You could try using inference endpoints to serve the model

Vanessa-Taing commented 1 month ago

I see, thanks!