rockstor / rockstor-core

Linux/BTRFS based Network Attached Storage(NAS)
http://rockstor.com/docs/contribute_section.html
GNU General Public License v3.0
557 stars 138 forks source link

replication received_uuid blocker re snap to share promotion #2902

Closed Hooverdan96 closed 1 month ago

Hooverdan96 commented 1 month ago

As observed in the scenario described on the Rockstor community forum (users stevek, Hooverdan, phillxnet), when quotas are NOT enabled on the receiving system, it can happen that a snapshot cannot be promoted because the system fails to set the read-write (rw) property. In this scenario the receiving system was running Rockstor on OpenSUSE Tumbleweed.

https://forum.rockstor.com/t/disk-structure-under-mnt2-and-replication-question/9720/21

The resulting error message implies that using the -f (force) flag will allow the property setting.

ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force option -f if you really want unset the read-only status. The value of received_uuid is used for incremental send, consider making a snapshot instead. Read more at btrfs-subvolume(8) and Subvolume flags.', '']`

[EDIT by phillxnet] A dependency regarding reproducer systems: believed to pertain to Leap 15.6 / TW receiver side systems. Where a jump in kernel and btrfs was observed: container newer safe-guards that have lead to this -f requirement. See now associated and merged PR referenced below in comments.

Hooverdan96 commented 1 month ago

In the same forum thread, I documented a PoC with the suggested change:

N.B. quotas are disabled, otherwise this error can be masked by #2901.

changing: https://github.com/rockstor/rockstor-core/blob/1ddcf4b6f6ad6a451fdaef492fe974417d4dbfe3/src/rockstor/fs/btrfs.py#L2311-L2314

to:

def set_property(mnt_pt, name, val, mount=True):
    if mount is not True or is_mounted(mnt_pt):
        cmd = [BTRFS, "property", "set", "-f", mnt_pt, name, val]
        return run_command(cmd)

which resulted in successful replications beyond the usual failure point

phillxnet commented 1 month ago

N.B. I have now observed this failure with quotas enabled (on receiving system):

Reproduced with Rockstor 5.0.14-0 Leap 15.6 send & receive instances:

[01/Oct/2024 11:25:04] INFO [storageadmin.views.snapshot:61] Supplanting share (67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01) with snapshot (.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1).
[01/Oct/2024 11:25:04] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that', '']
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py", line 94, in create_repclone
    set_property(snap_path, "ro", "false", mount=False)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 2314, in set_property
    return run_command(cmd)
           ^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 289, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that', '']
[01/Oct/2024 11:25:04] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/16/snapshots/test_share01_1_replication_1/repclone

With the following qgroup details (receiving system):

rleap15-6:~ # btrfs qgroup show /mnt2/rock-pool/ | grep snapshot
...
0/694         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
0/695         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
0/696         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3

N.B. in this reproducer instance there is no 2015 (rockstor) parent qgroup assignment. Only that of the default 0 group.

phillxnet commented 1 month ago

@Hooverdan96 My previous comment reproducer details were observed with a trivial data set. This may well explain seeing this error while quotas are enabled. I'll continue with this issue while I have a reproducer and then look to the quotas related blocker that likely proceeds this issue when there is an actual real-life data payload.

phillxnet commented 1 month ago

Likely pertinent historical reference from btrfs mailing list: https://www.spinics.net/lists/linux-btrfs/msg69951.html

phillxnet commented 1 month ago

Notes on first 3 replication received subvol properties: Installer sending system -> rleap15-6 receiving system

1st

Send end

No longer available in reproducer systems as the oldest snapshot in replication is deleted.

Receive end

N.B. As this is the first replication event: this subvol has no parent.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
        Name:                   test_share01_1_replication_1
        UUID:                   1e35ce1a-de98-6749-9dd7-6acb3dc85ee5
        Parent UUID:            -
        Received UUID:          2f87835e-d4d1-774b-a606-0f4e8763b41a
        Creation time:          2024-10-01 11:10:03 +0100
        Subvolume ID:           694
        Generation:             4921
        Gen at creation:        4917
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:10:03 +0100
        Receive transid:        4918
        Receive time:           2024-10-01 11:10:03 +0100
        Snapshot(s):
                                .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
        Quota group:            0/694
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

2nd:

Send end

installer:~ # btrfs subvolume show /mnt2/raid-test/.snapshots/test_share01/test_share01_1_replication_2
.snapshots/test_share01/test_share01_1_replication_2
        Name:                   test_share01_1_replication_2
        UUID:                   7acc89a7-e758-3e4c-b6f0-4e9ae3c7358b
        Parent UUID:            6033150e-c572-3e49-aeb9-94ae1c915163
        Received UUID:          -
        Creation time:          2024-10-01 11:15:04 +0100
        Subvolume ID:           258
        Generation:             53
        Gen at creation:        53
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           0
        Send time:              2024-10-01 11:15:04 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
        Quota group:            0/258
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Receive end

N.B. This subvol has the first (1st above) as it's parent UUID subvol. Send receive working on sending differences between two subvols.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
        Name:                   test_share01_1_replication_2
        UUID:                   26e85e61-5ba2-4240-b6a7-d75a76ec77bc
        Parent UUID:            1e35ce1a-de98-6749-9dd7-6acb3dc85ee5
        Received UUID:          7acc89a7-e758-3e4c-b6f0-4e9ae3c7358b
        Creation time:          2024-10-01 11:15:04 +0100
        Subvolume ID:           695
        Generation:             4924
        Gen at creation:        4921
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:15:04 +0100
        Receive transid:        4922
        Receive time:           2024-10-01 11:15:04 +0100
        Snapshot(s):
                                .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
        Quota group:            0/695
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

3rd

Send end

installer:~ # btrfs subvolume show /mnt2/raid-test/.snapshots/test_share01/test_share01_1_replication_3
.snapshots/test_share01/test_share01_1_replication_3
        Name:                   test_share01_1_replication_3
        UUID:                   6cbbc48f-2d23-3a42-b09d-2f5504ebb4cf
        Parent UUID:            6033150e-c572-3e49-aeb9-94ae1c915163
        Received UUID:          -
        Creation time:          2024-10-01 11:20:03 +0100
        Subvolume ID:           259
        Generation:             55
        Gen at creation:        55
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           0
        Send time:              2024-10-01 11:20:03 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
        Quota group:            0/259
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Receive end

N.B. In turn, this 3rd subvol has as its parent UUID the above 2nd subvol. And its Received UUID as the 3rd (sender end) snapshot above.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
        Name:                   test_share01_1_replication_3
        UUID:                   2870b9a3-1a85-694a-99cf-b045160b5a43
        Parent UUID:            26e85e61-5ba2-4240-b6a7-d75a76ec77bc
        Received UUID:          6cbbc48f-2d23-3a42-b09d-2f5504ebb4cf
        Creation time:          2024-10-01 11:20:04 +0100
        Subvolume ID:           696
        Generation:             4924
        Gen at creation:        4924
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:20:04 +0100
        Receive transid:        4925
        Receive time:           2024-10-01 11:20:04 +0100
        Snapshot(s):
        Quota group:            0/696
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Original (Sending) source share info

Having stopped the sending replication: to catch the final state of this replication failure reproducer we have the original source (sending side) share we were replicating showing up as follows:

installer:~ # btrfs subvolume show /mnt2/raid-test/test_share01/
test_share01
        Name:                   test_share01
        UUID:                   6033150e-c572-3e49-aeb9-94ae1c915163
        Parent UUID:            -
        Received UUID:          -
        Creation time:          2024-05-28 17:44:48 +0100
        Subvolume ID:           256
        Generation:             57
        Gen at creation:        11
        Parent ID:              5
        Top level ID:           5
        Flags:                  -
        Send transid:           0
        Send time:              2024-05-28 17:44:48 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
                                .snapshots/test_share01/test_share01_1_replication_2
                                .snapshots/test_share01/test_share01_1_replication_3
                                .snapshots/test_share01/test_share01_1_replication_4
        Quota group:            0/256
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB
phillxnet commented 1 month ago

@Hooverdan96 I'm just working through our options here, but remember that we already make allowances for our approach, I.e. the cascade of snapshots. We purposefully do not touch a 'live' receiving snapshot. And such a change is way too large for this late in the testing phase. But our code is such that we could look to improvements later. But not just yet I think. Still working on this one. But we do already account for this sensitivity: we were just not actually warned against what we do before hand. And that warning pertains to if the subvol we are modifyting was still involved in a send/receive. My understanding is that is is not: due to our precautions re the cascade sends.

phillxnet commented 1 month ago

@Hooverdan96 Also note that a clone in btrfs speak is a little different to our clones. Here, as far as my understanding goes, we already maintain upstream advice via our snapshot cascade: and sending the differences. We send differences between ro snapshots only. The cascade then allows for us to do our 'repclone' (snap-to-share-supplant) which is to supplant a share with a snapshot. There-by updating the user-visible replication share. A snapshot is actually a clone (mostly instantaneous), and we already do this as part of our send/receive wrapper. It's where all the complexity comes from: and the purpose of our cascade in the first place. Incidentally we use to use 5 snapshot !!! But I changed it to 3 a few years ago. 5 really tended to confuse folks and could take a very long time to end up with results folks expected: an actual Share at the receiving end :) .

We will have to enact some good technical docs for this whole process as I have to re-learn each time I look at it. But I think we have a good design of our own: it's just poorly documented for both us and the general users! Pretty sure we are good to go with your suggested force here: and didn't see a reference for removing a sending uuid.

Hooverdan96 commented 1 month ago

Yes, that explanation makes sense in the cloning context. And the point being that the third of these cascading snapshots will not be changed in between setting the read-write flag and it being promoted to share.

phillxnet commented 1 month ago

Closing as: Fixed by #2911