Closed davidberenstein1957 closed 1 year ago
Ohh sorry, I see, you seem to be working on updates currently. Please send a request if you are looking for more context.
Hey @davidberenstein1957 you're right that we're in the midst of some big updates! Could you say more about the problems you ran into on the nightly wheels (2.0.0.dev0)?
So, I was trying to serve my FastAPI application via /serve/tutorials/web-server-integration.html via 2.0.0.dev0. I really love the new interface copared to the old one. However, for FastAPI specifically, the requests are already formatted via the FastAPI decorators, so request.body() doesn't need to be there. Similarly, the asynchronous FastAPI endpoint don't need to be awaited anymore. Also, I seem required to use ray.get(response) to obtain the response from the ray serve handle.
Furthermore, the FastAPI app.on_event('startup') doesn't log/report erros, so if anything goed wrong it is very difficult to debug and people might blame Ray for this flaw. I think you could add an example including a try-except-statement in the example to avoid this.
I am currently experiencing something similar to the following issues: https://github.com/ray-project/ray/issues/8419 https://github.com/ray-project/ray/issues/3116
However, I am initializing the @serve.deployment on a class which is rather complex and inherits some stuff from other classes, i.e. it is inconvenient to have to create a setup.py file as suggested is 3116, also addding the import (8419) withint the ray remote is not an option because I use the @serve.deployment on a class that inherits another class.
Also, when using the main approach suggested here. It does connect, but I get an error. No module named 'transformers', i.e. it is unable to find the transformer package.
This is really great feedback @davidberenstein1957 :) do you think you could provide a short code sample of exactly what you're trying to do? Off the top of my head, one thing you could do is make the deployment class a pretty simple wrapper of the actual underlying class that you're serving:
@serve.deployment
class Wrapper:
def __init__(self, *args):
from my_module import MyActualImplementationClass
self._wrapped = MyActualImplementationClass(*args)
def handle_request(self):
return self._wrapped.handle_request()
Awesome, I will try this tomorrow during less ungodly workhours. But here you can find my deployed cluster (ai-dev_cluster.yaml - mostly the same as example_cluster.yaml) that is deployed within my kubernetes cluster within a namespace. Within that same cluster and namespace, I am trying to deploy a FastAPI application that unloads the heavy stuff to Ray serve deployed transformers (main.py + requirements.txt). For convenience, I now replaced my SentimentAnalyzers with the standard GPT2 class from the example of FastAPI deployment on your website. During application startup, the application fail when calling GPT2.deploy() on line 73. I get the error No module named 'transformers'
. Running a local ray cluster via ray start --head
or via from ray.cluster_utils import Cluster
doesn't cause any issues, however within the kubernetes cluster it does.
requirements.txt
main.py.txt
ai-dev_cluster.yaml.txt
By the way, I am using tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim-2020-12-19 as Docker image to deploy my API to Kubernetes.
(pid=672, ip=10.240.0.185) File "python/ray/_raylet.pyx", line 500, in ray._raylet.execute_task (pid=672, ip=10.240.0.185) File "python/ray/_raylet.pyx", line 447, in ray._raylet.execute_task.function_executor (pid=672, ip=10.240.0.185) File "python/ray/_raylet.pyx", line 1657, in ray._raylet.CoreWorker.run_async_func_in_event_loop (pid=672, ip=10.240.0.185) File "/home/ray/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 432, in result (pid=672, ip=10.240.0.185) return self.__get_result() (pid=672, ip=10.240.0.185) File "/home/ray/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result (pid=672, ip=10.240.0.185) raise self._exception (pid=672, ip=10.240.0.185) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/serve/backend_worker.py", line 71, in __init__ (pid=672, ip=10.240.0.185) await sync_to_async(_callable.__init__)(*init_args) (pid=672, ip=10.240.0.185) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/async_compat.py", line 29, in wrapper (pid=672, ip=10.240.0.185) return func(*args, **kwargs) (pid=672, ip=10.240.0.185) File "./main.py", line 52, in __init__ (pid=672, ip=10.240.0.185) ModuleNotFoundError: No module named 'functional'
I feel that the issues might have something to do with https://docs.ray.io/en/master/cluster/commands.html#synchronizing-files-from-the-cluster-ray-rsync-up-down, however, I would expect this to happen automatically when serving a specific application. Also, calling os.system('pip install transformers') within tthe wrapper fixed a ModuleNotFoundError: No module named 'transformers', but I don't think this is the intended way of fixing this issue?
Withinhttps://github.com/ray-project/ray/tree/master/python/ray/autoscaler I can see some examples of .yml pipeline files that actually include setup_commands and file sharing interfaces, but idealy I want to upload files based on my seperate fastAPI pods.
@davidberenstein1957 we're currently working on improving support for specifying dependencies at for a given Serve deployment, but right now the best practice would be to build all of the requirements into the docker image you use for the ray cluster. That way the packages are available to all of the worker processes on your cluster. Would that work for you?
I could make this work for me, however, this does not fix my relative import issues, which means I would also have to add my entire application to the Ray Docker image. Ideally, something like the CommandRunner would be great, where I would first have to call ray.init() and afterwards can execute command on the cluster like ray rsync-up ./functional ./functional
and ray exec pip install transformers
. But I will also check if I might be able to initialize the class and add it to shared storage via ray.put() and initalize it using the wrapper and ray.get().
We are currently working on better supporting dynamic environments (RFC here), but for now one thing you could do is use the named conda env support.
The workflow here would be to install a conda env on the cluster with a specific name using ray exec or some other means, then in your Serve deployment you specify that env as the one for the actors to run in:
@serve.deployment(ray_actor_options={runtime_env={"conda": "my_conda_env_name"}})
class Deployment:
...
Actually one option for installing the env would be to do it using Ray tasks!
@ray.remote
def install_env(env_yaml):
# write env_yaml to temp file
subprocess.check_output(["conda", "create", tempfile])
You could schedule this to run on every node doing something like this:
refs = []
for node in ray.nodes():
node_id = node["NodeManagerAddress"]
node_resource = f"node:{node_id}"
refs.append(install_env.options(resources={node_resource: 0.001}).remote(env_yaml))
ray.get(refs)
I just had everything up and running with the seperate dockerimage approach, but after 15 minutes I got the following error message:
ray.util.client.dataclient - INFO - Server disconnected from data channel
Also, the cluster doesnn't seem to autoscale when in need of additional resources even though it is allowed to scale to more workers within the config.
@AmeerHajAli is the above the same issue that we recently addressed by adding gRPC keepalives to the Ray client?
Also, @davidberenstein1957 could you share the logs from the Serve controller (it should print some messages saying that deployments are pending startup) and the autoscaler logs? That should help diagnose the issue!
Hello, thanks for the great support by the way! I hope this is enough context.
My set-up. kubernetes cluster -> ray operator cluster. Note, the operator was not able to use my custom docker image so it is running an image without the installation of transformers and pytorch. ray.cluster.yml.txt ray.operator.yml.txt
4 ML FastAPI Microservices from complex to simple (Spacy, Classification, Wordembeddings, Sentiment). I am working my way to ray integration from the simplest service starting with the Sentiment service, which offers some transfer-learned huggingface Transformers. I tried deploying 3 Dutch versions and 1 English version via the following FastAPI set-up using 0.5 CPU per deployment. deployment.py.txt
The process starts working and my dashboard is showing how the models are loaded on the head node and re-distributed over the worker nodes. However, the head-node clogs up and ends up using too much resources and then the FastAPI connection dies to the scaling not working. Also, when deploying less models, the models end up remaining on the head node, which seems a bit weird to me. I would expect them to be moved to the, less crucial, worker nodes. head_fail_logs.txt operator_logs.txt
@edoakes, @AmeerHajAli I found some insights into the issue, but it might be a diffcult fix from your side, i.e., it seems like a kubernetes and/or config issue from my side.
So, our kubernetes cluster is using X CPU and 2 GPU. Within my pipeline.yml for the deployment of the Ray cluster, and within my pipeline.yml for the deployment of my sentiment Microservice, I did specifically assign GPU resources via taint. So, when deploying the BERT transformer models with "@serve.deployment(num_replicas=2)", they were still initialized on the GPUs within our cluster, meaning that the Ray autoscaler seemed to be in a 'mismatch' with the actual GPU/CPU-usage, resulting in an error. Or could this be fixed by using the rayproject/ray:nightly-gpu image? It does not give an error after deploying the models with "@serve.deployment(num_replicas=2, ray_actor_options={"num_gpus": 0.25})".
cc @DmitriGekhtman @ijrsvt
I am getting an autoscaler error within my app.py file when in limited resources, een though the config is allowed to scale. autoscaler-error-log.txt
Thanks for the logs. That's interesting... evidently, the autoscaling process got a SIGTERM signal.
Or is it actually the main operator process that got cut off?
I missed some of the above context, but is it correct that this is happening when the Ray head gets overloaded?
When this happens does the operator deployment indicate that there were restarts?
kubectl -n <namespace> get deployment ray-operator
If not, could you share the operator pod's logs
kubectl -n <namespace> logs ray-operator-xxxx
My interpretation is that the Ray head pod is getting overloaded (don't know why that is happening). Then perhaps the head gets killed by Kubernetes leading to an (expected) autoscaler failure and then an (unexpected) operator failure.
I tested 3 set-ups with different images for the operator pods and workers/heads (nightlly and release==1.3.0. I used the same ray version for the API as I did for the worker/head when connecting. Also I used the release notation for the serve deployment and not the @serve.deployment() one.
custem docker as suggestby by @edoakes DockerfileCPU.txt
deploymeny YAML ray.cluster.yml.txt ray.dashboard.yml.txt ray.operator.yml.txt
API startup main.py.txt
The ray head does get overloaded when using the @serve.deployment decorator by the way, but have only found time to test with the old notation due to the fact that I also wanted to test the release images.
Also, when using the @serve.deployment decorator along with the nightly image for the operator, head and workers, the deployment failures get in an infinite loop resulting in the logs shown underneath. With the serve.create_backend() and serve.create_endpoint() approach, this does not happen.
<!--StartFragment-->
| 2021-05-28 05:23:06,394 INFO backend_state.py:773 -- Adding 1 replicas to backend 'DutchSentiment'.
-- | --
| 2021-05-28 05:23:06,625 ERROR controller.py:121 -- Exception updating backend state: Failed to look up actor with name 'HSzhnk:SERVE_CONTROLLER_ACTOR:DutchSentiment#gCitXs'. You are either trying to look up a named actor you didn't create, the named actor died, or the actor hasn't been created because named actor creation is asynchronous.
| 2021-05-28 05:23:06,833 WARNING backend_state.py:864 -- Replica DutchSentiment#gCitXs of backend DutchSentiment failed health check, stopping it.
<!--EndFragment-->
Stale, reopen if still an issue
These instructions do not seem to work for ray[serve] or ray[default] @1.3.0. Also, ray==2.0.0.dev0 has problems.