spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
491 stars 43 forks source link

Issue mounting a volume #91

Closed murphycj closed 3 years ago

murphycj commented 3 years ago

Thanks so much for creating spotty, so far it seems like a great tool! However, I had an issue mounting a volume (named myvolume) to a new instance. Below is the error I get:

2021-03-01 03:55:28,888 P1955 [INFO]    + for i in '${!MOUNT_DIRS[*]}'
2021-03-01 03:55:28,888 P1955 [INFO]    + MOUNT_DIR=/mnt/myvolume
2021-03-01 03:55:28,889 P1955 [INFO]    + DEVICE=/dev/xvdg
2021-03-01 03:55:28,889 P1955 [INFO]    + '[' '!' -b /dev/xvdg ']'
2021-03-01 03:55:28,889 P1955 [INFO]    ++ cfn-get-metadata --stack spotty-instance-sincnet-aws1 --region us-east-1 --resource VolumeAttachmentG -k VolumeId
2021-03-01 03:55:28,889 P1955 [INFO]    + VOLUME_ID=vol-03e0908f2f88a9b23
2021-03-01 03:55:28,889 P1955 [INFO]    ++ lsblk -o NAME,SERIAL -dpJ
2021-03-01 03:55:28,889 P1955 [INFO]    ++ jq -rc '.blockdevices[] | select(.serial == "vol03e0908f2f88a9b23") | .name'
2021-03-01 03:55:28,889 P1955 [INFO]    + DEVICE=/dev/nvme2n1
2021-03-01 03:55:28,889 P1955 [INFO]    + '[' -z /dev/nvme2n1 ']'
2021-03-01 03:55:28,889 P1955 [INFO]    + blkid -o value -s TYPE /dev/nvme2n1
2021-03-01 03:55:28,889 P1955 [INFO]    xfs
2021-03-01 03:55:28,889 P1955 [INFO]    + mkdir -p /mnt/myvolume
2021-03-01 03:55:28,889 P1955 [INFO]    + mount /dev/nvme2n1 /mnt/myvolume
2021-03-01 03:55:28,889 P1955 [INFO]    + chmod 777 /mnt/myvolume
2021-03-01 03:55:28,889 P1955 [INFO]    + resize2fs /dev/nvme2n1
2021-03-01 03:55:28,889 P1955 [INFO]    resize2fs 1.42.13 (17-May-2015)
2021-03-01 03:55:28,889 P1955 [INFO]    resize2fs: Bad magic number in super-block while trying to open /dev/nvme2n1
2021-03-01 03:55:28,889 P1955 [INFO]    Couldn't find valid filesystem superblock.
2021-03-01 03:55:28,889 P1955 [INFO] ------------------------------------------------------------
2021-03-01 03:55:28,889 P1955 [ERROR] Exited with error code 1

However, it had no issue mounting another volume that it created on the fly. Here is the relevant portion of my spotty.yaml so you know what I mean:

containers:
  - projectDir: /workspace/project
    image: tensorflow/tensorflow:latest-gpu-py3-jupyter
    env:
      PYTHONPATH: /workspace/project
    ports:
      # Jupyter
      - containerPort: 8888
        hostPort: 8888
    volumeMounts:
      - name: workspace
        mountPath: /workspace
      - name: workspace2
        mountPath: /workspace2

instances:
  - name: aws1
    provider: aws
    parameters:
      region: us-east-1
      availabilityZone: us-east-1a
      subnetId: subnet-[xxxxxxxxxxxxxxxxxxx]
      instanceType: g4dn.xlarge
      spotInstance: true
      ports: [8888]
      volumes:
        - name: workspace
          parameters:
            size: 50
        - name: workspace2
          parameters:
            volumeName: myvolume

I am able use the spotty sh -H command and see both volumes are mounted though.

Also, please note that myvolume volume is that it is of type gp3, and I saw in the documentation it is not one of the supported EBS types. So maybe that is the issue? If so, any plans on supporting gp3 volume type?

Thanks!

murphycj commented 3 years ago

I should also add that when I initially setup myvolume I manually spun up a t2.micro instance and attached the volume to it and ran the following commands:

drive='xvdf'
sudo mkdir /data
sudo mkfs -t xfs /dev/$drive
sudo mount /dev/$drive /data
sudo chown ec2-user:ec2-user /data

I then detached the volume and proceeded to try to use it with spotty. I mention all this in case any of the above commands somehow conflict with spotty. I am not very familiar with disk formatting and stuff.

murphycj commented 3 years ago

Ok, so I tried out formatting a new volume with a different file system time (e.g. sudo mkfs -t ext4 /dev/$drive). Trying out that new volume works with spotty. So the issue is spotty assumes the volume has a certain filesystem type.