spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
491 stars 43 forks source link

Caught non-retryable exception while listing file:///mnt/heareval-heareval-leader-joseph-workspace/project: [Errno 28] No space left on device #110

Closed turian closed 2 years ago

turian commented 2 years ago

I ran into root disk space problems, and then my docker terminated. spotty sh complained there was not enough space in /tmp, and would not run.

I removed /opt/conda/ by SSH'ing directly into the machine. I also used the cloud console to add 50GB more to the root volume.

Now, I think I have enough disk space on the machine:

Filesystem      Size  Used Avail Use% Mounted on
udev            103G     0  103G   0% /dev
tmpfs            21G  8.5M   21G   1% /run
/dev/sda1        99G   45G   50G  48% /
tmpfs           103G     0  103G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           103G     0  103G   0% /sys/fs/cgroup
/dev/sda15      124M  5.7M  119M   5% /boot/efi
/dev/sdb        2.0T  833G  1.2T  43% /home/jupyter
tmpfs            21G     0   21G   0% /run/user/1004
tmpfs            21G     0   21G   0% /run/user/1003

However, spotty sh is broken, and I have the following error in my /var/log/startup-script.log:

Caught non-retryable exception while listing file:///mnt/heareval-heareval-leader-joseph-workspace/project: [Errno 28] No space left on device

Why don't I have enough space on the device? How can I spotty into the machine again?

apls777 commented 2 years ago

Is it still an issue for you? Can you share your spotty.yaml?

turian commented 2 years ago

This seems to have resolved, thank you