Open Michaelvll opened 6 days ago
Testing the impact on provisioning time now
Testing the impact on provisioning time now
I suppose we should avoid this argument in our host image creation, i.e. the packer file. In that case, it should not affect the provisioning time?
The latest custom images don't use --system-site-packages Also tested the example yaml and works fine
The following task.yaml could cause failure of job submission, with
sky launch -c test task.yaml
Reproduction
Reason
After logging into the cluster, it seems the issue is caused by installing
img2dataset
changed the numpy/pyarrow version in the base python environment, which somehow causes issue for skypilot-runtime in a different python venv.Potential fixes
We may need to be careful with the
--system-site-packages
option in our skypilot-runtime setup when creating the venv, as packages changed in the base env may affect skypilot runtime as well.https://github.com/skypilot-org/skypilot/blob/53380e26f01452559012d57b333b17f40dd8a4d1/sky/skylet/constants.py#L158
Tested with removing such argument from the skypilot-runtime setup, and it seems the problem goes away. We should avoid this argument in our hosted image (cc'ing @yika-luo) and see if we should get rid of it for custom images as well (this may cause much longer provisioning time due to more packages to be installed instead of using the system existing ones).