Open Michaelvll opened 2 years ago
A thought for the API could be using a chain dag and specify the data transfer between chain node using EBS.
chain_task:
__data_transfer: ebs
download_task:
run: wget …
train_task:
resources: V100:8
setup: …
run: …
Can we not use the mounted file system for this? e.g. spin up a cpunode where you begin the download with the file system mounted then launch your big GPU machine with the mounted FS also attached.
This is actually something I do for projects/homework as well, where I change the GPU instance to a larger instance type when I need some more GPUs (same EBS volume) and downsize when I want to save costs but still want to run smaller experiments (e.g. change to T4 instance instead of V100). I usually do this by using "change instance type" in the AWS console.
I would appreciate this feature quite a bit!
Can we not use the mounted file system for this? e.g. spin up a cpunode where you begin the download with the file system mounted then launch your big GPU machine with the mounted FS also attached.
The initial problem is already "S3 -> EBS". Currently mounted storage is S3 only - don't have support for mounting EBS/EFS for now. Agreed this is something we should investigate.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue is worth keeping and could be resolved with our python interface.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue was closed because it has been stalled for 10 days with no activity.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
@Michaelvll Is this planned on the roadmap? This will be very helpful for development and to save cost.
The user would like to download a large dataset to the disk with a cheaper CPU instance, and then switch to a GPU instance with the same disk, so she can train on the dataset downloaded.