Closed binarycrayon closed 7 months ago
Thank you for reporting this issue @binarycrayon! We just pushed a fix for this in #3313. Could you help test if it works with A10 GPUs on Azure, as we don't have the quota for A10 on Azure? : )
If you would like to test it out, the following would be the line to install the fix from that PR:
pip uninstall skypilot skypilot-nightly; pip install git+https://github.com/skypilot-org/skypilot.git@bcac2d764ae5e5fcac8fd64549888573a0b1d39a
Yes, confirmed the fix worked. Thanks so much for the quick fix!
I 03-14 20:22:51 cloud_vm_ray_backend.py:4237] Creating a new cluster: 'dialogue-choice-gemma-2b' [1x Azure(Standard_NV6ads_A10_v5, {'A10': 1}, ports=['8080'])].
I 03-14 20:22:51 cloud_vm_ray_backend.py:4237] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 03-14 20:22:57 cloud_vm_ray_backend.py:1364] To view detailed progress: tail -n100 -f /home/../sky_logs/sky-2024-03-14-20-22-48-834635/provision.log
I 03-14 20:22:58 cloud_vm_ray_backend.py:1754] Launching on Azure westus2
I 03-14 20:25:28 log_utils.py:45] Head node is up.
I 03-14 20:28:16 cloud_vm_ray_backend.py:1602] Successfully provisioned or found existing VM.
I 03-14 20:28:20 cloud_vm_ray_backend.py:3076] Running setup on 1 node.
resources requested
able to provision instance but blocked at
INFO: Waiting for task resources on 1 node. This will block if the cluster is full.