Closed askfongjojo closed 6 months ago
In this case, it's a functional issue. Taking a snapshot of an attached disk is allowed.
Here's the bug in the snapshot create saga:
The disk state here is expected to be Detached, but if the disk is attached to a stopped instance this match will return 503. The part of Nexus that checks whether or not the Pantry should be used to take a snapshot says to use the Pantry if the instance is stopped:
There needs to be more work here: at the minimum, the disk's state changes to Maintenance as part of this saga, and this has to work with (read: block) the instance from starting. This may not be a candidate for FCS though, due to the workaround of starting the instance existing?
Thanks for root-causing/sizing this. Let's re-target this to MVP give the effort and impact involved. I'll document this known issue in the release notes because users will likely want to create clean snapshots on stopped instances (so that things such as temporary files locks created by running applications won't be part of the snapshot).
Want to note that a customer indicated that they would like to see a fix for this issue sooner for the same reason I mentioned above (i.e. their best practice is to create snapshots on stopped instances).
I think I have seen the 503 before but mistaken that as a control plane issue:
(Once I started up the VM, I was able to create snapshots for both of the disks attached to this vm on rack2.)
If snapshot is prohibited on disks attached to stopped instances, we should probably prevent the snapshot action with a more explicit error. A 503 response would imply that the service is only temporarily unavailable and user may retry the action.
If snapshots should be allowed on disks, then it is a functional issue.