Error loading model from local filesystem

jcushman commented 1 year ago

models/README.md says "For loading a model from file system, set engine_config.hf_model_id to an absolute filesystem path accessible from every node in the cluster."

I ran:

sudo docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=/data -v "$(pwd)"/data:/data -v "$(pwd)"/models:/models anyscale/aviary:latest bash
aviary run --model /models/myconfig.yaml

with /models/myconfig.yaml having hf_model_id: /models/llama-2-13b-chat.ggmlv3.q4_1.bin

The output was:

(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 417, in initialize_and_get_metadata [repeated 4x across cluster]
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143) Traceback (most recent call last):
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     return self.__get_result()
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     raise self._exception
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     raise RuntimeError(traceback.format_exc()) from None
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143) RuntimeError: Traceback (most recent call last):
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     await self.replica.update_user_config(
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 688, in update_user_config
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     await reconfigure_method(user_config)
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/routers/model_app.py", line 82, in reconfigure
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     await self.engine.start()
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/llm/engine/tgi.py", line 165, in start
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     self.new_worker_group = await self._create_worker_group(
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/observability/fn_call_metrics.py", line 126, in async_wrapper
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     return await wrapped(*args, **kwargs)
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/llm/engine/tgi.py", line 364, in _create_worker_group
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     _ = AutoTokenizer.from_pretrained(llm_config.actual_hf_model_id)
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 677, in from_pretrained
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 510, in get_tokenizer_config
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     resolved_config_file = cached_file(
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/transformers/utils/hub.py", line 428, in cached_file
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     resolved_file = hf_hub_download(
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     validate_repo_id(arg_value)
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)   File "/home/ray/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143)     raise HFValidationError(
(ServeReplica:meta-llama--Llama-2-13b-chat-hf_meta-llama--Llama-2-13b-chat-hf pid=17143) huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/llama-2-13b-chat.ggmlv3.q4_1.bin'

This looks like hf_model_id is being validated as a Hugging Face repo name and can't be an absolute path.

Yard1 commented 1 year ago

For loading from file system, hf_model_id should be set to a path pointing to a directory containing a model in huggingface format (config.json, bin files, etc.), not to the bin file itself. Can you try that?

rifkybujana commented 11 months ago

Hi, i got the same error when trying to load the model from S3. I've followed the instructions on the /model/README.md. Here are my engine configs:

engine_config:
  model_id: meta/llama-2-7b
  s3_mirror_config:
    bucket_uri: s3://bucket_name/llama-2-7b
  type: VLLMEngine
...

and inside the bucket there are:

- config.json
- generation_config.json
- model.safetensors
- quant_config.json
- special_tokens_map.json
- tokenizer_config.json
- tokenizer.json
- tokenizer.model

ray-project / ray-llm

Error loading model from local filesystem #47