spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
491 stars 43 forks source link

No left space on device when building docker-image. #100

Closed ThomasFournier closed 3 years ago

ThomasFournier commented 3 years ago

Hi, I'm trying to build a StyleGAN model from this image 'nvcr.io/nvidia/pytorch:20.12-py3' (see this link).

Unfortunately, I get this error:

[INFO] a5ff18c9283b: Download complete [INFO] 125a5da70c41: Download complete [INFO] write /var/lib/docker/tmp/GetImageBlob403717976: no space left on device [INFO] ------------------------------------------------------------ [ERROR] Exited with error code 1

Any idea on how to cope with this issue?

I did (unsuccessful) tests with machines of types p2.xlarge, p2.4xlarge or p3.2xlarge and a volume/size of 100 GB. I suspect the error comes from the Pytorch install, so it would require some extra configurations to make it work.

Thanks

msscully commented 3 years ago

I ran into the same issue and fixed it by setting "rootVolumeSize" in the parameters section of the instance:

instances:
  - name: aws-1
     provider: aws
     parameters:
       region: us-east-1
       instanceType: p2.xlarge
       ports: [6006, 6007, 8888]
       rootVolumeSize: 200
       volumes:
          ...