Possible to run on a single 8x A100 machine on-premise?

wired-mind commented 1 year ago

I would like to run a single machine that is on-premise, but not able to get the models to load as it is looking for actor/worker resource nodes that don't exist. Do you have any config example for single machine on-premise?

Yard1 commented 1 year ago

Aviary requires a Ray Cluster to run. You can set up an on-premise Ray Cluster (https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html). Because Aviary uses Ray Custom Resources to ensure that each model is scheduled on an intended GPU type, you will need to set those in both the Ray cluster configuration and Aviary model yamls.

You can edit the EC2 config to use on-prem instead with your desired node type.

Alternatively, if you just want to experiment, you can do the following:

SSH into your GPU node,
load the docker image/install Aviary locally with pip install -e ".[backend, frontend]"
edit the scaling_config section in model configuration and change the accelerator_type_[TYPE] to accelerator_type_a100
start ray with ray start --resources "{\"accelerator_type_a100\": 1}" (the actual number of GPUs will be detected automatically)
start aviary with aviary run --model model_yaml_with_edited_scaling_config.yaml

This will start a Ray cluster composed of just this single node.

wired-mind commented 1 year ago

Perfect, thank you. Got it all working along with the frontend in a docker container. One problem I encountered was that both the frontend and backend default to port 8000, so the front end needed to be started like this: serve run --host 0.0.0.0 --port 7860 aviary.frontend.app:app

waleedkadous commented 1 year ago

@Yard1 what do you think about making the frontend run on port 7860 by default to be consistent with normal Gradio and not cause this problem?

Yard1 commented 1 year ago

I think that's a good idea!

ray-project / ray-llm

Possible to run on a single 8x A100 machine on-premise? #18