skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.75k stars 501 forks source link

[Job] Schedule jobs according to the `--cpus` #1944

Open Michaelvll opened 1 year ago

Michaelvll commented 1 year ago

Currently, the --cpus or cpus: are not respected for job scheduling for a SkyPilot task, which makes the user with CPU tasks need to manually schedule them.

romilbhardwaj commented 1 year ago

+1 - ran into this today when I was trying to limit the number of tasks running on a cluster for my CPU-only job.

This also gets very confusing because we silently allow scheduling of infeasible tasks (according to resource spec). E.g., I was able to do something like:

$ sky launch -c test -- echo hi # Launches a 8 CPU cluster
$ sky launch -c test --cpus 24 -- echo hi2 # I was expecting this to fail, but it went through

Till we fix this, we should perhaps throw a warning to the user that CPUs are not respected during scheduling.

stolendog commented 6 months ago

The same applies to memory parameter. I specified a requirement for 16GB of memory for exec task:

sky exec -d my-cluster --memory=16 tasks/test.yaml

However, it executed it on nodes that only have 8GB of memory.

concretevitamin commented 6 months ago

@stolendog Was this on a k8s cluster, or on a cloud VM cluster?

stolendog commented 6 months ago

on cloud VM cluster