spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
491 stars 43 forks source link

RuntimeError: can't start new thread #128

Closed Mudryi closed 1 year ago

Mudryi commented 1 year ago

I have a problem while running spotty start in logs there is a problem with the pip install

RuntimeError: can't start a new thread

I tried to remove the pip install from the Docker file, the container was built successfully, but after that, i got the same problem if connected via Spotty sh and ran any pip command

As i understand this problem potentially happened because of a bad Docker version. Any suggestion on how can i easily change it?

Here is my Docker file

FROM python:3.9

COPY requirements-spotty.txt requirements-spotty.txt

RUN pip3 install -r requirements-spotty.txt

And here is my spotty file

project:
  name: asr
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*

containers:
  - projectDir: /workspace/project
    file: docker/Dockerfile.spotty
    runtimeParameters: [ '--shm-size', '8G' ]
    volumeMounts:
      - name: workspace
        mountPath: /workspace

instances:
  - name: asr-finetune
    provider: aws
    parameters:
      region: eu-central-1
      instanceType: g4dn.xlarge
      dockerDataRoot: /docker
      volumes:
        - name: workspace
          parameters:
            size: 50
            deletionPolicy: retain
        - name: docker
          parameters:
            size: 50
            mountDir: /docker
            deletionPolicy: retain

scripts:
  train: |
    python main.py
apls777 commented 1 year ago

Have you tried to build this Docker image locally on your machine first? Does it work?

If it does, I'd try to manually build it on a Spotty instance. To do that:

  1. Start an instance as usual with spotty start
  2. Connect to the host OS using -H flag: spotty sh -H
  3. Try to build your image manually and figure out what goes wrong

As i understand this problem potentially happened because of a bad Docker version. Any suggestion on how can i easily change it?

If it's indeed the Docker version, there is a chance that Spotty doesn't pick up the latest Deep Learning AMI anymore. You can find the right AMI ID and force Spotty to use it using the instances[].parameters.amiId parameter (see Instance Parameters).

And here is the Spotty code that searches for an AMI.

Mudryi commented 1 year ago

Hi, @apls777 thanks a lot for your advice with building images manually, I cant fix the issue with this image but I change it to python:3.9.17-bullseye and everything works fine.