Open jmorenobl opened 4 months ago
what's the error you are seeing? and logs?
Just by executing this:
model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/predibase/lorax:main --model-id $model
I get the following error:
2024-05-27T11:40:55.235184Z INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "252cfb445bd6", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:40:55.235284Z INFO download: lorax_launcher: Starting download process.
2024-05-27T11:40:58.738448Z ERROR download: lorax_launcher: Download encountered an error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
_download_weights(model_id, revision, extension, auto_convert, source, api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
model_source.weight_files()
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
return weight_files(self.model_id, self.revision, extension, self.api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
filenames = weight_hub_files(model_id, revision, extension, api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
info = api.model_info(model_id, revision=revision)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
hf_raise_for_status(r)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-6654714a-7fc4308c45580b7328298ca4;876ae9f0-b94e-4384-a4a9-fd3139261aa7)
Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must be authenticated to access it.
Error: DownloadError
If I add my token I and execute it this way:
model=mistralai/Mistral-7B-Instruct-v0.1
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model
I get the following error:
2024-05-27T11:45:16.081927Z INFO lorax_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.1", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "8b4a73dd40ee", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:45:16.082040Z INFO download: lorax_launcher: Starting download process.
Error: DownloadError
2024-05-27T11:45:18.784725Z ERROR download: lorax_launcher: Download encountered an error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 124, in download_weights
_download_weights(model_id, revision, extension, auto_convert, source, api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
model_source.weight_files()
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 179, in weight_files
return weight_files(self.model_id, self.revision, extension, self.api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 69, in weight_files
filenames = weight_hub_files(model_id, revision, extension, api_token)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/hub.py", line 34, in weight_hub_files
info = api.model_info(model_id, revision=revision)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1922, in model_info
hf_raise_for_status(r)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-6654724e-68506e57290cd75960e9177c;090b4284-8d6f-441f-916f-2fa3e5ab57c3)
Cannot access gated repo for url https://huggingface.co/api/models/mistralai/Mistral-7B-Instruct-v0.1.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted and you are not in the authorized list. Visit https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 to ask for access.
It's fine since I don't have access to that model but I would use another example for a quickstart. I tried with Phi-3 since it's not a gated model but it didn't work either:
$ export model=microsoft/Phi-3-small-8k-instruct
$ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:main --model-id $model
2024-05-27T11:52:11.862867Z INFO lorax_launcher: Args { model_id: "microsoft/Phi-3-small-8k-instruct", adapter_id: None, source: "hub", default_adapter_source: None, adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "99cd7f793e22", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-05-27T11:52:11.862989Z INFO download: lorax_launcher: Starting download process.
2024-05-27T11:52:14.407194Z INFO lorax_launcher: hub.py:121 Download file: model-00001-of-00004.safetensors
2024-05-27T11:52:31.144230Z INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00001-of-00004.safetensors in 0:00:16.
2024-05-27T11:52:31.144324Z INFO lorax_launcher: hub.py:150 Download: [1/4] -- ETA: 0:00:48
2024-05-27T11:52:31.156145Z INFO lorax_launcher: hub.py:121 Download file: model-00002-of-00004.safetensors
2024-05-27T11:53:19.520065Z INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00002-of-00004.safetensors in 0:00:48.
2024-05-27T11:53:19.520245Z INFO lorax_launcher: hub.py:150 Download: [2/4] -- ETA: 0:01:05
2024-05-27T11:53:19.520607Z INFO lorax_launcher: hub.py:121 Download file: model-00003-of-00004.safetensors
2024-05-27T11:54:38.927900Z INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00003-of-00004.safetensors in 0:01:19.
2024-05-27T11:54:38.928020Z INFO lorax_launcher: hub.py:150 Download: [3/4] -- ETA: 0:00:48
2024-05-27T11:54:38.928307Z INFO lorax_launcher: hub.py:121 Download file: model-00004-of-00004.safetensors
2024-05-27T11:54:46.533325Z INFO lorax_launcher: hub.py:130 Downloaded /data/models--microsoft--Phi-3-small-8k-instruct/snapshots/1adb635233ffce9e13385862a4111606d4382762/model-00004-of-00004.safetensors in 0:00:07.
2024-05-27T11:54:46.533440Z INFO lorax_launcher: hub.py:150 Download: [4/4] -- ETA: 0
2024-05-27T11:54:47.004372Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-05-27T11:54:47.004674Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-05-27T11:54:52.749530Z ERROR lorax_launcher: server.py:265 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3small
2024-05-27T11:54:53.511126Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 84, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 318, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 252, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 328, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type phi3small
rank=0
2024-05-27T11:54:53.609061Z ERROR lorax_launcher: Shard 0 failed to start
2024-05-27T11:54:53.609080Z INFO lorax_launcher: Shutting down shards
Error: ShardCannotStart
Is there any model I could use to start playing around with LoRAX?
@jmorenobl You just need to log in to HuggingFace Hub, open the model, accept the EULA, and then run LoRaX by passing your own HUGGING_FACE_HUB_TOKEN
env variable
System Info
Pre-built Docker image on g4dn.xlarge with Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0 (Amazon Linux 2)
Information
Tasks
Reproduction
Execute the example on the home page: model=mistralai/Mistral-7B-Instruct-v0.1 volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \ ghcr.io/predibase/lorax:main --model-id $model
Expected behavior
A server starts successfully serving mistral.