Closed pschafhalter closed 8 months ago
Thanks for the report and the workaround @pschafhalter! We use FUSE tools to mount a bucket and it's been observed that writes sometimes have consistency problems like this. We should look deeper.
Cc @romilbhardwaj.
@pschafhalter I removed the workdir: .
field and ran this example with a new bucket name:
sky launch -c dbg --down test.yaml --use-spot
A few minutes afterwards, aws s3 ls <bucket>
did show the file.
A few things we can check
Thanks for looking into this @concretevitamin.
With the provided config, the task successfully completes and the file does not show up. This is also happening for me in another task. Do you have an idea why workdir: .
might cause the issue?
This is also happening on GCP with Cloud Storage. When the machine launches for the first time everything works as expected files are uploaded to the bucket however when starting and using a stopped VM when the job completes new files in the mounted folder are not uploaded to the bucket but are present in the directory.
@nakkaya Do you mean this could happen outside of SkyPilot?
@concretevitamin No I meant machines stopped (--autostop) and started by skypilot.
Thanks for the report @nakkaya!
This is a known issue tracked in #1203. As a temporary workaround, can you try using sky launch -c <your_cluster> --no-setup mytask.yaml
? This should re-mount any buckets.
To help us find a good solution to this, can you tell us a little more about your usage of SkyPilot - are you using it to run batch jobs through the job queue interface or are you ssh-ing into the machine for interactive development?
@romilbhardwaj Thanks for the reply.
sky launch -c
--no-setup mytask.yaml
I am running a long running job, I'll try this when it completes.
I primarily use it to run batch jobs through the job queue interface but once in a while I do ssh into instance to debug why something fails.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue was closed because it has been stalled for 10 days with no activity.
When using GCP for compute and S3 for storage, files generated by the task aren't automatically uploaded to S3. My SkyPilot version is 0.2.2.
Minimal example to reproduce the issue:
skypilot-s3-bug
.sky launch -c test-s3-bug s3_bug.yaml
. The contents ofs3_bug.yaml
are the following:workdir: .
file_mounts: /data: name: skypilot-s3-bug mode: MOUNT
run: | echo "hello world" > /data/test.txt
$ ssh test-s3-bug $ cd /data/ $ ls test.txt $ cat test.txt hello world