Closed jayanthnair closed 1 year ago
Hi @jayanthnair, thanks for submitting this! This is a known issue that should be fixed by #36324. Could you retry your experiment with the Ray nightly and check if it still fails?
Hi @shrekris-anyscale, I tried the following 2 things:
pip install -U "ray[rllib,data,serve] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-win_amd64.whl"
on my local machine. I tried serving the old checkpoint created with a previous ray version and it gave me the same errorAnything else I should be trying?
@jayanthnair I tried with the exact tutorial code
# serve_agent.py
import ray
import ray.rllib.algorithms.ppo as ppo
from ray import serve
def train_ppo_model():
# Configure our PPO algorithm.
config = (
ppo.PPOConfig()
.environment("CartPole-v1")
.framework("torch")
.rollouts(num_rollout_workers=0)
)
# Create a `PPO` instance from the config.
algo = config.build()
# Train for one iteration.
algo.train()
# Save state of the trained Algorithm in a checkpoint.
checkpoint_dir = algo.save("/tmp/rllib_checkpoint")
return checkpoint_dir
checkpoint_path = train_ppo_model()
from starlette.requests import Request
@serve.deployment
class ServePPOModel:
def __init__(self, checkpoint_path) -> None:
# Re-create the originally used config.
config = ppo.PPOConfig()\
.framework("torch")\
.rollouts(num_rollout_workers=0)
# Build the Algorithm instance using the config.
self.algorithm = config.build(env="CartPole-v0")
# Restore the algo's state from the checkpoint.
self.algorithm.restore(checkpoint_path)
async def __call__(self, request: Request):
json_input = await request.json()
obs = json_input["observation"]
action = self.algorithm.compute_single_action(obs)
return {"action": int(action)}
ppo_model = ServePPOModel.bind(checkpoint_path)
serve.run(ppo_model)
and start it with serve run serve_agent:agent
I was able to query like so
I also just tried with the very simple app file structure like
- hello_serve.py
- utils/
- test.py
and the file content like
# hello_serve.py
import time
from ray import serve
from starlette.requests import Request
from utils.test import hello
@serve.deployment
class HelloModel:
def __init__(self):
hello()
async def __call__(self, starlette_request: Request) -> None:
hello()
return f"{hello()}, {time.time()}"
model = HelloModel.bind()
# test.py
def hello():
text = "hello_from_utils"
print(text)
return text
When I run serve run hello_serve:model
everything still works as expected and was able to return the text and the timestamp. Can you try the above examples and see if you get the same import utils error?
Also, just want to double check can you also try running ray --version
to see if the installed version is the latest nightly?
I think you may or may not need to run pip uninstall -y ray
before install from the wheel if you already have ray installed. Something like pip uninstall -y ray && pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl
might help.
@GeneDer Thanks for the response. I think I might have finally figured this out.
When I have a folder structure like this
- deploy/
- serve_agent.py
- inference_checkpoints/
- checkpoint_00010/
- checkpoint files
- utils/
- some_util.py
and I run the serve run command from the deploy folder, I get the nomodule named utils error. However, when I copy the serve_agent script and move it up a level , i.e. in the same base folder as the utils folder this error goes away. Curiously, none of the scripts saved in the utils folder are needed for deployment. And if I change the name of the utils folder, I get the same error again. So it seems like it is looking for a module named utils in the same working folder as the deployment script. Is this intended?
Hi @jayanthnair Yes, custom modules should live in the same directory/ subdirectories of the deployment script. There are some Ray code to add the deployment script directory to the python import path so they will be importable by Python. The bug we fixed previously is that the custom utils collided with Ray's utils and Ray's utils got the import precedence. With the latest Ray, user's utils should take the import precedence given it's living in the same directory as the deployment script :)
Thanks for reporting your issue is fix! Feel free to let me know if you have other questions!
What happened + What you expected to happen
I have used Ray RLLib and Ray on AML to train an RL agent on AzureML. I've downloaded the checkpoints to my local machine and have created a script to serve the agent by using the template provided here. When I try to run the script from the command line using the serve API, I get an error saying 'ModuleNotFoundError: No module named 'utils''. However, if I copy paste the same code into a jupyter notebook, it runs fine. Console logs below:
Issue Severity
High: It blocks me from completing my task.