stratis-storage / stratisd

Easy to use local storage management for Linux.
https://stratis-storage.github.io
Mozilla Public License 2.0
793 stars 56 forks source link

Support merging a snapshot based on stored instruction in filesystem level metadata #3622

Open mulkieran opened 3 months ago

mulkieran commented 3 months ago

Related https://github.com/stratis-storage/project/issues/597

Depends on https://github.com/stratis-storage/stratisd/issues/3621 .

We assume just one thing, that filesystem metadata is augmented with a new field, merge, with type Option<bool>.

Thus, the merge must be performed on pool setup, subsequent to the filesystem metadata being written. Note also that this changes how we need to read the filesystem metadata. At present, we can read it serially, setting up each fileystem as we process the metadata. However, once we are implementing merge, we must retain something about the filesystems as a group. Specifically, we must scan all the filesystems to find out which are in a scheduled merge relationship and we will not know if there are any such until we have scanned all, because the metadata is only on the snapshot, not on the origin, and both must be brought together for the merge.

If we think that such a situation could be a problem, it could be allowable to store supplemental information for the origin. If that were done, then, generally, the filesystems not so marked could all be set up immediately, and only those scheduled for merge, either as origin or snapshot, would have their processing delayed until the end.

If a merge fails, the best thing is to keep both filesystems exactly as they were in their prior state and bring the pool up. However, this condition of the pool must be identified on the D-Bus as problem with the pool, "scheduled merge not completed" or something like that.

In the event of a merge, the merged snapshot replaces its origin. Conceptually, it stops existing, and its meaning as the origin of some other snapshot disappears. What this means is that the possible origin values of other filesystems on the pool must be updated to None if they currently point to the merged snapshot and the updated metadata must be written for each of these filesystems. What about other snapshots pointing to the origin in this merge? I would think they should stay in case it turns out to be necessary to role back to a previous snapshot. But what if it turns out that they post-date the snapshot rolled back to? Maybe if that is the case they should lose their origin field.

In the case that somehow, there are two snapshots scheduled to be merged into the same origin, it still seems like the pool should be brought up, neither merge should be begun, and a warning message on the pool should be placed on the D-Bus.

An important implementation question is how to handle reading filesystem metadata. The metadata does not take up much size in memory, so reading it and retaining it seems fine. This is already how the metadata is read. Performing merge operations, manipulating the filesystem metadatas accordingly, and then using those to guide setup of each individual filesystem ought to work.

bmr-cymru commented 3 months ago

One additional thought I had in regards to this: how should the filesystem UUID be handled during a merge?

Since Stratis changes the UUID on snapshot creation (to allow the filesystem to be mounted without the use of -onouuid) the merge of the snapshot thin volume will not reflect the exact state of the volume at the time of the snapshot and would require a further action to re-set the UUID back to that of the origin volume.

Would that be reasonable to include in the proposed merge operation, or do you think that it would be best to retain the changed UUID (and leave the administrator to make that change if they depend on the value of the UUID, e.g. for mounting)?

mulkieran commented 3 months ago

One additional thought I had in regards to this: how should the filesystem UUID be handled during a merge?

Since Stratis changes the UUID on snapshot creation (to allow the filesystem to be mounted without the use of -onouuid) the merge of the snapshot thin volume will not reflect the exact state of the volume at the time of the snapshot and would require a further action to re-set the UUID back to that of the origin volume.

Would that be reasonable to include in the proposed merge operation, or do you think that it would be best to retain the changed UUID (and leave the administrator to make that change if they depend on the value of the UUID, e.g. for mounting)?

My assumption is that the merge ought to fully restore the filesystem to its state when the snapshot was taken, so that properly the merge ought to restore the origin's former UUID.

bmr-cymru commented 3 months ago

My assumption is that the merge ought to fully restore the filesystem to its state when the snapshot was taken, so that properly the merge ought to restore the origin's former UUID.

Great. I think that's the least surprising behaviour for users.