ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.07k stars 5.6k forks source link

[Serve] ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB #32049

Open jamm1985 opened 1 year ago

jamm1985 commented 1 year ago

What happened + What you expected to happen

Deployment fails with large objects, especially large numpy arrays.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [14], in <module>
----> 1 serve.run(MyModelDeployment.bind("test", ray.get(weights_ref)))

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/api.py:536, in run(target, _blocking, host, port)
    523     deployment_parameters = {
    524         "name": deployment._name,
    525         "func_or_class": deployment._func_or_class,
   (...)
    533         "is_driver_deployment": deployment._is_driver_deployment,
    534     }
    535     parameter_group.append(deployment_parameters)
--> 536 client.deploy_group(
    537     parameter_group, _blocking=_blocking, remove_past_deployments=True
    538 )
    540 if ingress is not None:
    541     return ingress._get_handle()

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/_private/client.py:37, in _ensure_connected.<locals>.check(self, *args, **kwargs)
     35 if self._shutdown:
     36     raise RayServeException("Client has already been shut down.")
---> 37 return f(self, *args, **kwargs)

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/_private/client.py:251, in ServeControllerClient.deploy_group(self, deployments, _blocking, remove_past_deployments)
    248 deployment_args_list = []
    249 for deployment in deployments:
    250     deployment_args_list.append(
--> 251         self.get_deploy_args(
    252             deployment["name"],
    253             deployment["func_or_class"],
    254             deployment["init_args"],
    255             deployment["init_kwargs"],
    256             ray_actor_options=deployment["ray_actor_options"],
    257             config=deployment["config"],
    258             version=deployment["version"],
    259             route_prefix=deployment["route_prefix"],
    260             is_driver_deployment=deployment["is_driver_deployment"],
    261         )
    262     )
    264 updating_list = ray.get(
    265     self._controller.deploy_group.remote(deployment_args_list)
    266 )
    268 tags = []

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/_private/client.py:37, in _ensure_connected.<locals>.check(self, *args, **kwargs)
     35 if self._shutdown:
     36     raise RayServeException("Client has already been shut down.")
---> 37 return f(self, *args, **kwargs)

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/_private/client.py:485, in ServeControllerClient.get_deploy_args(self, name, deployment_def, init_args, init_kwargs, ray_actor_options, config, version, route_prefix, is_driver_deployment)
    471 if (
    472     deployment_config.autoscaling_config is not None
    473     and deployment_config.max_concurrent_queries
    474     < deployment_config.autoscaling_config.target_num_ongoing_requests_per_replica  # noqa: E501
    475 ):
    476     logger.warning(
    477         "Autoscaling will never happen, "
    478         "because 'max_concurrent_queries' is less than "
    479         "'target_num_ongoing_requests_per_replica' now."
    480     )
    482 controller_deploy_args = {
    483     "name": name,
    484     "deployment_config_proto_bytes": deployment_config.to_proto_bytes(),
--> 485     "replica_config_proto_bytes": replica_config.to_proto_bytes(),
    486     "route_prefix": route_prefix,
    487     "deployer_job_id": ray.get_runtime_context().job_id,
    488     "is_driver_deployment": is_driver_deployment,
    489 }
    491 return controller_deploy_args

File /root/conda/envs/recommender/lib/python3.9/site-packages/ray/serve/config.py:501, in ReplicaConfig.to_proto_bytes(self)
    500 def to_proto_bytes(self):
--> 501     return self.to_proto().SerializeToString()

ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB: 3200001162

Versions / Dependencies

Linux 4cb379e01e77 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Python 3.9.10
ray-air                   2.2.0            py39hf3d152e_2    conda-forge
ray-all                   2.2.0            py39hf3d152e_2    conda-forge
ray-core                  2.2.0            py39h4d85f9a_2    conda-forge
ray-dashboard             2.2.0            py39h9f3bf79_2    conda-forge
ray-data                  2.2.0            py39hf3d152e_2    conda-forge
ray-default               2.2.0            py39hf3d152e_2    conda-forge
ray-k8s                   2.2.0            py39hf3d152e_2    conda-forge
ray-rllib                 2.2.0            py39hf3d152e_2    conda-forge
ray-serve                 2.2.0            py39hf3d152e_2    conda-forge
ray-train                 2.2.0            py39hf3d152e_2    conda-forge
ray-tune                  2.2.0            py39hf3d152e_2    conda-forge
grpc-cpp                  1.43.2               h9e046d8_3    conda-forge
grpcio                    1.46.3           py39h0f497a6_0    conda-forge

Reproduction script

import ray
from ray import serve

ray.init()

weights = np.ones((20000, 20000))
weights_ref = ray.put(weights)

@serve.deployment(route_prefix="/")
class MyModelDeployment:
    def __init__(self, msg: str, weights: np.ndarray):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg
        self._weights = weights

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}

serve.run(MyModelDeployment.bind("test", ray.get(weights_ref)))

Issue Severity

High: It blocks me from completing my task.

sharlec commented 1 year ago

I am experiencing the same problem, did you solve it?

jamm1985 commented 1 year ago

Right now, I'm not pushing big weights from object storage through server deployments. Instead, the weights are obtained from the internal deployment process after the deployment graph is run in the cluster.