ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 89 forks source link

Error running TheBloke--Llama-2-70B-chat-GPTQ #32

Closed mahaddad closed 1 year ago

mahaddad commented 1 year ago

Hi Aviary Team,

Testing out the new update and facing the following error.

I am using the default YAML file with docker image: "anyscale/aviary:latest-tgi" and TheBloke--Llama-2-70B-chat-GPTQ

Attaching serve controller log file serve_controller_502.log

ERROR 2023-07-28 22:38:24,144 controller 502 deployment_state.py:567 - Exception in replica 'TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#CtvupC', the replica will be stopped. Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready _, self._version = ray.get(self._ready_obj_ref) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 2520, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ.initialize_and_get_metadata() (pid=19572, ip=172.31.18.182, actor_id=d3085fba2f6070e53474e80601000000, repr=<ray.serve._private.replica.ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ object at 0x7f6c63ad8be0>) File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata raise RuntimeError(traceback.format_exc()) from None RuntimeError: Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata await self.replica.update_user_config( File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config await reconfigure_method(user_config) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure await self.predictor.rollover( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 374, in rollover self.new_worker_group = await self._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 297, in _create_worker_group worker_group = await super()._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 484, in _create_worker_group worker_group = await self._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 273, in _start_prediction_workers worker_group = await super()._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 409, in _start_prediction_workers await asyncio.gather( File "/home/ray/anaconda3/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable return (yield from awaitable.__await__()) ray.exceptions.RayTaskError(AttributeError): ray::ContinuousBatchingPredictionWorker.init_model() (pid=19871, ip=172.31.18.182, actor_id=22ce6fec997530eb25d9abdc01000000, repr=ContinuousBatchingPredictionWorker:TheBloke/Llama-2-70B-chat-GPTQ) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 131, in init_model self.generator = init_model( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner ret = func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer model, tokenizer = initializer.load(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 57, in load model = self.load_model(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/tgi.py", line 51, in load_model return TGIInferenceWorker( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/continuous/tgi/tgi_worker.py", line 452, in __init__ with patch( File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1437, in __enter__ original, local = self.get_original() File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1410, in get_original raise AttributeError( AttributeError: <class 'text_generation_server.utils.weights.Weights'> does not have the attribute '_set_gptq_params' INFO 2023-07-28 22:38:24,144 controller 502 deployment_state.py:887 - Stopping replica TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#CtvupC for deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. INFO 2023-07-28 22:39:16,555 controller 502 deployment_state.py:1586 - Adding 1 replica to deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. INFO 2023-07-28 22:39:16,555 controller 502 deployment_state.py:331 - Starting replica TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#QPfLlB for deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. ERROR 2023-07-28 22:39:27,663 controller 502 deployment_state.py:567 - Exception in replica 'TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#QPfLlB', the replica will be stopped. Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready _, self._version = ray.get(self._ready_obj_ref) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 2520, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ.initialize_and_get_metadata() (pid=20115, ip=172.31.18.182, actor_id=80f90fd080031632a4d414b001000000, repr=<ray.serve._private.replica.ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ object at 0x7eeee0588b50>) File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata raise RuntimeError(traceback.format_exc()) from None RuntimeError: Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata await self.replica.update_user_config( File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config await reconfigure_method(user_config) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure await self.predictor.rollover( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 374, in rollover self.new_worker_group = await self._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 297, in _create_worker_group worker_group = await super()._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 484, in _create_worker_group worker_group = await self._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 273, in _start_prediction_workers worker_group = await super()._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 409, in _start_prediction_workers await asyncio.gather( File "/home/ray/anaconda3/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable return (yield from awaitable.__await__()) ray.exceptions.RayTaskError(AttributeError): ray::ContinuousBatchingPredictionWorker.init_model() (pid=20477, ip=172.31.18.182, actor_id=b0081819c1527f5aee87c50b01000000, repr=ContinuousBatchingPredictionWorker:TheBloke/Llama-2-70B-chat-GPTQ) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 131, in init_model self.generator = init_model( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner ret = func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer model, tokenizer = initializer.load(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 57, in load model = self.load_model(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/tgi.py", line 51, in load_model return TGIInferenceWorker( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/continuous/tgi/tgi_worker.py", line 452, in __init__ with patch( File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1437, in __enter__ original, local = self.get_original() File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1410, in get_original raise AttributeError( AttributeError: <class 'text_generation_server.utils.weights.Weights'> does not have the attribute '_set_gptq_params' INFO 2023-07-28 22:39:27,664 controller 502 deployment_state.py:887 - Stopping replica TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#QPfLlB for deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. INFO 2023-07-28 22:40:20,799 controller 502 deployment_state.py:1586 - Adding 1 replica to deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. INFO 2023-07-28 22:40:20,799 controller 502 deployment_state.py:331 - Starting replica TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#mgcjuf for deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ. ERROR 2023-07-28 22:40:31,807 controller 502 deployment_state.py:567 - Exception in replica 'TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#mgcjuf', the replica will be stopped. Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready _, self._version = ray.get(self._ready_obj_ref) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 2520, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ.initialize_and_get_metadata() (pid=20720, ip=172.31.18.182, actor_id=3963f7bf0f917a265bd53dea01000000, repr=<ray.serve._private.replica.ServeReplica:TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ object at 0x7efb895b8b20>) File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata raise RuntimeError(traceback.format_exc()) from None RuntimeError: Traceback (most recent call last): File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata await self.replica.update_user_config( File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config await reconfigure_method(user_config) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure await self.predictor.rollover( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 374, in rollover self.new_worker_group = await self._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 297, in _create_worker_group worker_group = await super()._create_worker_group( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 484, in _create_worker_group worker_group = await self._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/continuous_batching_predictor.py", line 273, in _start_prediction_workers worker_group = await super()._start_prediction_workers( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 409, in _start_prediction_workers await asyncio.gather( File "/home/ray/anaconda3/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable return (yield from awaitable.__await__()) ray.exceptions.RayTaskError(AttributeError): ray::ContinuousBatchingPredictionWorker.init_model() (pid=21022, ip=172.31.18.182, actor_id=e40ed769652bfe8fa7a373db01000000, repr=ContinuousBatchingPredictionWorker:TheBloke/Llama-2-70B-chat-GPTQ) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 131, in init_model self.generator = init_model( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner ret = func(*args, **kwargs) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer model, tokenizer = initializer.load(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 57, in load model = self.load_model(model_id) File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/tgi.py", line 51, in load_model return TGIInferenceWorker( File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/continuous/tgi/tgi_worker.py", line 452, in __init__ with patch( File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1437, in __enter__ original, local = self.get_original() File "/home/ray/anaconda3/lib/python3.10/unittest/mock.py", line 1410, in get_original raise AttributeError( AttributeError: <class 'text_generation_server.utils.weights.Weights'> does not have the attribute '_set_gptq_params' INFO 2023-07-28 22:40:31,808 controller 502 deployment_state.py:887 - Stopping replica TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ#mgcjuf for deployment TheBloke--Llama-2-70B-chat-GPTQ_TheBloke--Llama-2-70B-chat-GPTQ.

Yard1 commented 1 year ago

Hi @mahaddad, can you pull the latest image and try again?

mahaddad commented 1 year ago

Thanks for the weekend support @Yard1!! It's now fixed and I have successfully deployed Llama 70B across 4 GPUs in a g5.12xlarge.

Nothing urgent, but was this fix applied to https://hub.docker.com/layers/anyscale/aviary/0.1.2-56ab8352bcd4adf65c2eb8387982a85931374e27-tgi/images/sha256-143afa517d575d98deb1789959d1e91f5b73a5a55a95462a402ffbdd61f7fab6?context=explore also? I'd like to use static images wherever possible.

Yard1 commented 1 year ago

This is the image with the fix - https://hub.docker.com/layers/anyscale/aviary/0.1.2-7938b708d6c8cae614cf9712a98c40db3b737725-tgi/images/sha256-834d4b8ea4bb1db40a9dc51c43962594fd8e84456dd28041632182700dbf165e?context=explore