Closed fryz closed 3 months ago
Thanks for reaching out!
The FastAPI app you have set up here is probably not needed for this particular use case of deploying and calling a module (but do let us know otherwise if there's something specific you had in mind). Behind the scenes Runhouse spins up a FastAPI server on the cluster which will allow you to directly call the module on the cluster via an HTTP call. What you are seeing are the logs of the Runhouse server stored on the cluster in path: ~/.rh/server.log
.
The below snippet should likely be all you need to deploy and call this module:
import runhouse as rh
import numpy as np
from scipy.special import softmax
from transformers import AutoModelForSequenceClassification, AutoConfig
from transformers import AutoTokenizer
class SentimentAnalysis:
def __init__(self, model_name="cardiffnlp/twitter-roberta-base-sentiment-latest"):
self.model_name = model_name
self.model = None
self.config = None
self.tokenizer = None
@staticmethod
def preprocess(text):
"""
Preprocess text (username and link placeholders)
"""
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
def predict(self, text):
if self.model is None:
self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)
if self.config is None:
self.config = AutoConfig.from_pretrained(self.model)
if self.tokenizer is None:
self.tokenizer = AutoTokenizer.from_pretrained()
text = SentimentAnalysis.preprocess(text)
encoded_input = self.tokenizer(text, return_tensors='pt')
output = self.model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
ranking = np.argsort(scores)
ranking = ranking[::-1]
l_scores = {}
for i in range(scores.shape[0]):
l = self.config.id2label[ranking[i]]
s = scores[ranking[i]]
l_scores[l] = np.round(float(s), 4)
return l_scores
cluster = rh.ondemand_cluster(
name="fastapi-runhouse-example",
instance_type="CPU:2+",
provider="aws",
region="us-east-1"
).up_if_not()
# Set up an env using the requirements specified in the working dir
my_env = rh.env(name="scorer_env", working_dir="./")
# Send the module and its associated env to the cluster (or reload if it already exists on the cluster)
RemoteScorer = rh.module(SentimentAnalysis, name="remote-scorer").to(cluster, env=my_env)
# Generate a URL that can be used to call the module's "predict" method from anywhere
base_url = f"{RemoteScorer.endpoint()}/predict"
print(base_url)
For example, to then call the predict method via CURL:
curl http://<CLUSTER-IP>:32300/remote-scorer/predict?text="Some text"
A couple other things to note based on the above snippet:
get_or_to
instead of to
when sending the module to the cluster. This allows you to load the existing module by its name if it was already previously saved on the clusterssh fastapi-runhouse-example
and restart the runhouse server anytime by running runhouse restart
on the clusterSentimentAnalysis
module we can prevent unnecessary reloading when hitting the โpredictโ endpointThanks for following up @jlewitt1
I probably should have filled in some additional context - I caught up with Donny in our office earlier this week to discuss what we were planning on doing, but didn't fill in details in this issue.
High level, what I'm looking to accomplish is to manage the GPU cluster within business logic of an application that we are developing. Specifically, the goal is to be able to inference a handful of ML Models on GPUs while running the service on CPUs. Our service is implemented using FastAPI, and clients of our service interact through the services API. I want to ensure that the GPU interaction is completely opaque to the end-user as it's an implementation detail - they shouldn't know where the inference is running.
One thing I like about runhouse is that it looks like I can tie the lifecycle of the cluster to the lifecycle of the application - eg: start the cluster up when the app starts up an d terminate the cluster when the app spins down. I also talked with Donny on how to support autoscaling and service discovery mechanics as well. It seems like I can manage the infrastructure through our application rather than having to build and support these services out through our deployment artifacts.
The intent in opening this issue was to highlight a bug in the Cluster.up_if_not()
method - it seems like the process/thread that initializes the cluster doesn't ever return to the main control thread. So when I boot up my service and runhouse brings up the GPU cluster for the first time, the service hangs and requires a restart.
Does this make sense?
You can see what I mean if you run the app in my example code. The first time you run it, it will bring up the cluster but the process will hang and FastAPI won't serve requests (eg: the docs page at localhost:8000/docs or the API at localhost:8000/ping). But if you terminate the process and re-launch, it will detect that the cluster is already up and then FastAPI will serve it's API.
Hi Zach! Thanks for raising this and the detailed repro. I've been offline for a couple days for Jewish holidays and didn't get a chance to share the context with Josh, so thanks for the detail here. I've reproduced your error and figured out that it's a minor bug which we fixed on main but haven't released yet, and for some reason wasn't being surfaced through fastapi's lifespan feature (it also wasn't being surfaced when I ran with uvicorn app:app
). I've confirmed that the launch works properly in your script on runhouse@main, and we're planning to release within the next couple days (note that if you try upgrading to main
, be sure to upgrade SkyPilot too, because we also bumped the SkyPilot version to 0.6.0 in the latest release. You may want to take down any up clusters before doing that). I'll update you here when we release the fix.
Aside, I also noticed that the code doesn't complete because the working_dir isn't being recognized by that requirements.txt, the .git root one directory above is taking precedent. I think we want to change that behavior soon (and some other working_dir things as well), but in the meantime you can explicitly set the working_dir in an rh.env
, or simply move it one directory higher to be recognized and installed on the cluster (I also confired that if you do this, your repro will run through in full, see below)
curl -X POST "http://127.0.0.1:8000/score?text=Good"
{"positive":0.6844,"neutral":0.2628,"negative":0.0527}
curl -X POST "http://127.0.0.1:8000/score?text=This\restaurant\is\bad"
{"negative":0.951,"neutral":0.0434,"positive":0.0056}
Rad - thanks for the update. I'll watch this issue and let you know if it works after your next release.
Hey Zach - we released yesterday and I've confirmed this is fixed (though still moving the requirements.txt to the git root directory). Let me know if you still face any breakage with it.
Describe the bug
Example code: https://github.com/fryz/funhouse/tree/zf/fastapi/fastapi
When utilizing FastAPIs Lifespan Events (asynccontextmanager) to bring a cluster up, the cluster comes up but then hangs without returning to back to the server initialization logic.
Terminating the FastAPI process and bringing it back online recognizes the cluster and works.
Versions
Additional context
Logs from the startup: