Open kyegomez opened 2 months ago
Hi @kyegomez I just tried on commit https://github.com/skypilot-org/skypilot/commit/1e4e871398e121708d3e9809c0a98b905bf9f212 and it also failed for me (not frozen):
I 04-19 14:50:51 provisioner.py:553] Successfully provisioned cluster: sky-serve-controller-8a3968f2
...
E 04-19 14:52:16 subprocess_utils.py:84] ValueError: Failed to register service 'sky-service-1f06' on the SkyServe controller. Reason:
E 04-19 14:52:16 subprocess_utils.py:84] Traceback (most recent call last):
E 04-19 14:52:16 subprocess_utils.py:84] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
E 04-19 14:52:16 subprocess_utils.py:84] return _run_code(code, main_globals, None,
E 04-19 14:52:16 subprocess_utils.py:84] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
E 04-19 14:52:16 subprocess_utils.py:84] exec(code, run_globals)
E 04-19 14:52:16 subprocess_utils.py:84] File "/opt/conda/lib/python3.10/site-packages/sky/serve/service.py", line 260, in <module>
E 04-19 14:52:16 subprocess_utils.py:84] _start(args.service_name, args.task_yaml, args.job_id)
E 04-19 14:52:16 subprocess_utils.py:84] File "/opt/conda/lib/python3.10/site-packages/sky/serve/service.py", line 147, in _start
E 04-19 14:52:16 subprocess_utils.py:84] success = serve_state.add_service(
E 04-19 14:52:16 subprocess_utils.py:84] File "/opt/conda/lib/python3.10/site-packages/sky/serve/serve_state.py", line 225, in add_service
E 04-19 14:52:16 subprocess_utils.py:84] _DB.conn.commit()
E 04-19 14:52:16 subprocess_utils.py:84] sqlite3.OperationalError: database is locked
E 04-19 14:52:16 subprocess_utils.py:84]
E 04-19 14:52:16 subprocess_utils.py:84]
RuntimeError: Failed to spin up the service. Please check the logs above for more details.
Could you install the latest nightly? pip uninstall -y skypilot; pip install skypilot-nightly[..your clouds..]
? On today's main branch commit 24fcb44e7 it worked.
yeah now im getting this error with the present llama3 file, it should be accepting it, maybe I don't have the right clouds, but man it's been error after error
ervice from YAML spec: sky_serve.yaml
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Service Spec:
Readiness probe method: POST /v1/chat/completions {"model": "meta-llama/Meta-Llama-3-70B-Instruct", "messages": [{"role": "user", "content": "Hello! What is your name?"}], "max_tokens": 1}
Readiness initial delay seconds: 1200
Replica autoscaling policy: Fixed 2 replicas
Spot Policy: No spot policy
Each replica will use the following resources (estimated):
I 04-19 17:59:22 optimizer.py:1208] No resource satisfying <Cloud>({'L40': 1}, ports=['8081']) on [AWS, Azure, RunPod].
I 04-19 17:59:22 optimizer.py:1208] No resource satisfying <Cloud>({'A40': 1}, ports=['8081']) on [AWS, Azure, RunPod].
I 04-19 17:59:22 optimizer.py:1212] Did you mean: ['A100-80GB:8']
I 04-19 17:59:22 optimizer.py:1208] No resource satisfying <Cloud>({'A100': 1}, ports=['8081']) on [AWS, Azure, RunPod].
I 04-19 17:59:22 optimizer.py:1212] Did you mean: ['A100-80GB:1', 'A100-80GB:2', 'A100-80GB:4', 'A100-80GB:8', 'A100:8', 'A10G:1', 'A10G:4', 'A10G:8']
I 04-19 17:59:22 optimizer.py:693] == Optimizer ==
I 04-19 17:59:22 optimizer.py:704] Target: minimizing cost
I 04-19 17:59:22 optimizer.py:716] Estimated cost: $0.5 / hour
I 04-19 17:59:22 optimizer.py:716]
I 04-19 17:59:22 optimizer.py:839] Considered resources (1 node):
I 04-19 17:59:22 optimizer.py:909] -------------------------------------------------------------------------------------------------------
I 04-19 17:59:22 optimizer.py:909] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-19 17:59:22 optimizer.py:909] -------------------------------------------------------------------------------------------------------
I 04-19 17:59:22 optimizer.py:909] Azure Standard_NV6ads_A10_v5 6 55 A10:1 eastus 0.45 ✔
I 04-19 17:59:22 optimizer.py:909] AWS g6.xlarge 4 16 L4:1 us-east-1 0.80
I 04-19 17:59:22 optimizer.py:909] AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
I 04-19 17:59:22 optimizer.py:909] Azure Standard_NC24ads_A100_v4 24 220 A100-80GB:1 eastus 3.67
I 04-19 17:59:22 optimizer.py:909] -------------------------------------------------------------------------------------------------------
I 04-19 17:59:22 optimizer.py:909]
I 04-19 17:59:22 optimizer.py:927] Multiple Azure instances satisfy A10:1. The cheapest Azure(Standard_NV6ads_A10_v5, {'A10': 1}, ports=['8081']) is considered among:
I 04-19 17:59:22 optimizer.py:927] ['Standard_NV6ads_A10_v5', 'Standard_NV12ads_A10_v5', 'Standard_NV18ads_A10_v5', 'Standard_NV36ads_A10_v5', 'Standard_NV36adms_A10_v5'].
I 04-19 17:59:22 optimizer.py:927]
I 04-19 17:59:22 optimizer.py:927] Multiple AWS instances satisfy A10:1. The cheapest AWS(g5.xlarge, {'A10G': 1}, ports=['8081']) is considered among:
I 04-19 17:59:22 optimizer.py:927] ['g5.xlarge', 'g5.2xlarge', 'g5.4xlarge', 'g5.8xlarge', 'g5.16xlarge'].
I 04-19 17:59:22 optimizer.py:927]
I 04-19 17:59:22 optimizer.py:933] To list more details, run 'sky show-gpus A10'.
Launching a new service 'sky-service-c37d'. Proceed? [Y/n]: Y
Launching controller for 'sky-service-c37d'...
sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request:
Task<name=sky-service-c37d>(run='# Start sky serve se...')
resources: default instances.
To fix: relax or change the resource requirements.
Hint: sky show-gpus to list available accelerators.
sky check to check the enabled clouds.
# Serving Meta Llama-3 on your own infra.
#
# Usage:
#
# HF_TOKEN=xxx sky launch llama3.yaml -c llama3 --env HF_TOKEN
#
# curl /v1/chat/completions:
#
# ENDPOINT=$(sky status --endpoint 8081 llama3)
#
# # We need to manually specify the stop_token_ids to make sure the model finish
# # on <|eot_id|>.
# curl http://$ENDPOINT/v1/chat/completions \
# -H "Content-Type: application/json" \
# -d '{
# "model": "meta-llama/Meta-Llama-3-8B-Instruct",
# "messages": [
# {
# "role": "system",
# "content": "You are a helpful assistant."
# },
# {
# "role": "user",
# "content": "Who are you?"
# }
# ],
# "stop_token_ids": [128009, 128001]
# }'
#
# Chat with model with Gradio UI:
#
# Running on local URL: http://127.0.0.1:8811
# Running on public URL: https://<hash>.gradio.live
#
# Scale up with SkyServe:
# HF_TOKEN=xxx sky serve up llama3.yaml -n llama3 --env HF_TOKEN
#
# curl /v1/chat/completions:
#
# ENDPOINT=$(sky serve status --endpoint llama3)
# curl -L $ENDPOINT/v1/models
# curl -L http://$ENDPOINT/v1/chat/completions \
# -H "Content-Type: application/json" \
# -d '{
# "model": "databricks/llama3-instruct",
# "messages": [
# {
# "role": "system",
# "content": "You are a helpful assistant."
# },
# {
# "role": "user",
# "content": "Who are you?"
# }
# ]
# }'
envs:
MODEL_NAME: meta-llama/Meta-Llama-3-70B-Instruct
# MODEL_NAME: meta-llama/Meta-Llama-3-8B-Instruct
HF_TOKEN: <your-huggingface-token> # Change to your own huggingface token, or use --env to pass.
service:
replicas: 2
# An actual request for readiness probe.
readiness_probe:
path: /v1/chat/completions
post_data:
model: $MODEL_NAME
messages:
- role: user
content: Hello! What is your name?
max_tokens: 1
resources:
accelerators: {L4:8, A10g:8, A10:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8}
# accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB} # We can use cheaper accelerators for 8B model.
cpus: 32+
use_spot: True
disk_size: 512 # Ensure model checkpoints can fit.
disk_tier: best
ports: 8081 # Expose to internet traffic.
setup: |
conda activate vllm
if [ $? -ne 0 ]; then
conda create -n vllm python=3.10 -y
conda activate vllm
fi
pip install vllm==0.4.0.post1
# Install Gradio for web UI.
pip install gradio openai
pip install flash-attn==2.5.7
run: |
conda activate vllm
echo 'Starting vllm api server...'
# https://github.com/vllm-project/vllm/issues/3098
export PATH=$PATH:/sbin
# NOTE: --gpu-memory-utilization 0.95 needed for 4-GPU nodes.
python -u -m vllm.entrypoints.openai.api_server \
--port 8081 \
--model $MODEL_NAME \
--trust-remote-code --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--gpu-memory-utilization 0.95 \
--max-num-seqs 64 \
2>&1 | tee api_server.log &
while ! `cat api_server.log | grep -q 'Uvicorn running on'`; do
echo 'Waiting for vllm api server to start...'
sleep 5
done
echo 'Starting gradio server...'
git clone https://github.com/vllm-project/vllm.git || true
python vllm/examples/gradio_openai_chatbot_webserver.py \
-m $MODEL_NAME \
--port 8811 \
--model-url http://localhost:8081/v1 \
--stop-token-ids 128009,128001
I'd suggest using sky launch <yaml>
first. It's for troubleshooting if launching a single instance works ;)
sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request:
Task<name=sky-service-c37d>(run='# Start sky serve se...')
resources: default instances.
This suggests a default CPU-only serve controller cannot be launched. Could you run sky launch --down
to see if it works well? This is also just for getting past any initial quota/permission errors.
@concretevitamin it builds now with sky launch
but we'll see if it passes provisioning. I'm able to launch a cluster but then it just says provisioning 24/7
It's most commonly due to quota issues. You could use sky serve logs <service_name> 1
to check replica 1's provision logs. Btw, feel free to join https://slack.skypilot.co/ too for quick debugging.
Sky pilot keeps freezing when I try to serve something and also, when it does work it says PROVISIONING forever and never works across multiple clouds. I need help asap
SKY YAML
Version & Commit info:
sky -v
: 0.5.0sky -c
: skypilot, commit 1e4e871398e121708d3e9809c0a98b905bf9f212