oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
607 stars 137 forks source link

There seems to be a problem replicating datasets that are zfs-promoting branches part of bigger tree (rpool/ROOT, zones...) #503

Closed jimklimov closed 3 years ago

jimklimov commented 4 years ago

As I was checking how my backups went, I found that replicas of my rootfs related datasets are not branches of each other, but seem to be unique data histories (with no "origin" set in each of them on the backup side; I suppose --autoCreation was involved, my configs have it enabled by default). This causes confusion at least when a backup pool is not regularly accessible, and older automatic snapshots get deleted, and there is no common point to resync from (not even among snapshot made by beadm, etc.).

This may also waste space on backup pool, writing many copies of rootfs instead of building on top of shared history, but I am not fully sure about that bit (my backup pool claims a high dedup ratio while dedup is off in dataset settings).

In practice this impacts primarily rpool/ROOT/something(/subsets) and pool/zones/zonerootnames/... that are automatically shuffled as a particular rootfs gets activated (so whatever is related to the current rootfs is promoted as the main tree of ZFS "live" data, and older roots are origined at its snapshots), but I suppose other zfs promotions are similarly susceptible.

My guess is that autocreation should check if the newly found dataset has an origin snapshot attribute, and issue a replication going from that point not from scratch. The "origin" key word is not seen in znapzend codebase ;)

jimklimov commented 3 years ago

Examples from a live inspection session of where a replication avoided cleanup (of src and dst) because of this:

[Sat Aug  8 03:30:00 2020] [warn] ERROR: suspending cleanup source dataset because 21 send task(s) failed:
[Sat Aug  8 03:30:00 2020] [warn]  +-->   ERROR: snapshot(s) exist on destination,
    but no common found on source and destination: clean up destination
    backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z
    (i.e. destroy existing snapshots)
...

Looking at snapshots on current destination - they range from 2019-12-08 to 2019-12-10, including a few manually named snapshots along the way:

root@jimoi:/root# zfs list -d1 -tall -r backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                                                     USED  AVAIL  REFER  MOUNTPOINT
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z                                     8.03G   507G   501M  legacy
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-08T10:43:41Z   270K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191208-01                          270K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191208-02                          287K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-09T11:11:12Z   738K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T09:30:00Z   306K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T10:30:00Z   272K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T11:30:00Z   272K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T12:30:00Z   263K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T13:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T14:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T15:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T17:00:00Z      0      -   501M  -

On the source the history starts from 2019-12-19, so indeed nothing in common to sync from, right?..

root@jimoi:/root# zfs list -d1 -tall -r nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                              USED  AVAIL  REFER  MOUNTPOINT
nvpool/ROOT/hipster_2019.10-20191115T133333Z                                     71.4M   110G   501M  /
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T09:30:00Z   117K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T10:00:00Z   286K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T10:30:00Z   276K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T11:00:00Z   278K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T11:30:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T12:00:00Z      0      -   501M  -
... skip a few thousand half-hourlies ...
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-23T10:00:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-23T10:30:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-30T15:59:08Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-30T18:25:39Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-08-08T00:38:26Z      0      -   501M  -

What about the origins?

root@jimoi:/root# zfs get origin {backup-adata/snapshots/,}nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                 PROPERTY  VALUE                                                             SOURCE
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z  origin    -                                                                 -
nvpool/ROOT/hipster_2019.10-20191115T133333Z                         origin    nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26  -

So the source rootfs dataset historically cloned off a filesystem to update into newer release at "2019-12-19-09:10:26", upgrading from "hipster_2019.10" to "hipster_2020.04" and the zfs tree was re-balanced to promote the currently activated rootfs as the owner of all history, inverting the relation of who is a clone of whom (data-wise this is equivalent).

The destination dataset is a poor orphan without origins, in fact most of them are (I may have initialized the backup pool by replicating my system without znapzend, so oldest rootfs datasets on backup have proper origins). I suppose whenever znapzend found a new rootfs and had autoCreation enabled, it just made an automatic snapshot and sent it from scratch as the starting point, and rotated since, independently of other rootfs'es on the source pool.

Looking at manually named snapshots on the source datasets seems to confirm this guess, the ones expected to be common with "hipster_2019.10" rootfs source and backup, are now part of "hipster_2020.04" history in a relation that znapzend currently does not handle:

root@jimoi:/root# zfs list -tall -r nvpool/ROOT/hipster_2020.04-20200622T165833Z | grep @2019

nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-01-29-09:14:04                                    9.39M      -   521M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-02-13-01:22:00                                        0      -   521M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-03-22-08:56:38                                     173K      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190826-01                                            98.7M      -   536M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190830-01                                             382K      -   536M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-01                                            3.64M      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-02                                            6.41M      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191003-01                                            85.1M      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191003-02                                                0      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-15:33:08                                        0      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-23:19:11                                    88.0M      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-04-12:28:06                                    85.1M      -   543M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-11-15-14:33:11                                     205K      -   544M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191208-01                                             282K      -   501M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191208-02                                             297K      -   501M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26                                        0      -   501M  -
jimklimov commented 3 years ago

Sending an incremental replication from "the owner of history" to its differently-named clone seems to be a valid operation:

root@jimoi:/root# zfs send -R -I \
    nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26 \
    nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T12:00:00Z \
    | mbuffer -m 1g | zfs recv -vue backup-adata/snapshots/nvpool/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s, 28.0 KiB total, buffer   7% full
...

it showed some data read into the buffer, but for the past several minutes it is blinking the destination disk and showing no traffic so I'm a bit lost whether ZFS is doing anything in fact... maybe the kernel is thinking how to handle that...

UPDATE: Alas, after 7 minutes it found there is no good way to send from original origin:

cannot receive: local origin for clone backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191219-01 does not exist
mbuffer: error: outputThread: error writing to <stdout> at offset 0x7000: Broken pipe

summary: 30.0 KiByte in  7min 42.7sec - average of  0.1 KiB/s

indeed:

root@jimoi:/root# zfs list backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26
cannot open 'backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26': dataset does not exist

Makes sense in hindsight: the new name of rootfs started life as a poor orphan... so it has no history either:

root@jimoi:/root# zfs list -tall -r backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z
NAME                                                                                                                       USED  AVAIL  REFER  MOUNTPOINT
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z                                                       9.57G   507G   530M  legacy
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-22T17:00:00Z                     324K      -   507M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-22T17:30:00Z                     326K      -   507M  -
...
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-23T10:30:00Z                     324K      -   507M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-30T15:59:08Z                     333K      -   530M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-30T18:25:39Z                     333K      -   530M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-08-08T00:38:26Z                        0      -   530M  -
jimklimov commented 3 years ago

Probably a general non-disruptive solution is not possible: if we do have a history written, and suddenly a clone is made from an old manually named snapshot that is not present on destination, we might not be able to replicate without rolling back stuff from the trunk. If an even older common point exists, destination could be cloned from that, replicated up to the "origin" of the newly found source clone, and upwards from that point as the history of the new source clone; but this is likely to be fragile and work in some cases at best (which is better than nothing), and something admins should have taken care of in the past of their backup increments to do have those old common snapshots.

At the very least, when a new such situation arises, and there are no snapshots on destination newer than the divergence point (at least, none named manually and not via znapzend configured pattern), e.g. after a beadm update and/or package installation which makes a rootfs backup clone, znapzend can employ the logic for --since=X (or --sinceForced=X to roll back automated snapshots as needed) to ensure that the snapshot which is origin for another newly found clone appears on destination, and cleanly branched zfs tree can grow and maybe rebalance from that (we probably can detect the discrepancy of origins to understand that a zfs promote happened on source since we last looked). For the more complex cases we can stop and spew recommendations as we do now.

jimklimov commented 3 years ago

A big data point to design such stuff: here's how the BE layout on that system looks, in destination and source:

root@jimoi:/root# zfs list -d1 -tfilesystem -o name,origin -r {backup-adata/snapshots/,}nvpool/ROOT
NAME                                                                          ORIGIN
backup-adata/snapshots/nvpool/ROOT                                            -
backup-adata/snapshots/nvpool/ROOT/firefly_0215                               -
backup-adata/snapshots/nvpool/ROOT/firefly_0215a                              backup-adata/snapshots/nvpool/ROOT/firefly_0215@20180203-01
backup-adata/snapshots/nvpool/ROOT/hipster_2016.10_mate                       backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2016-11-09-23:38:21
backup-adata/snapshots/nvpool/ROOT/hipster_2016.10_mate_drm-20170430T155411Z  backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-05-03-04:49:41
backup-adata/snapshots/nvpool/ROOT/hipster_2017.04                            backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-05-31-06:51:36
backup-adata/snapshots/nvpool/ROOT/hipster_2017.04-20170903T231101Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-11-12-18:40:47
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10                            backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-12-08-11:46:57
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20171227T103659Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-12-30-10:46:38
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20171227T103659Z-nv        backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-01-31-23:41:11
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20180203T155526Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-05-03-14:19:26
backup-adata/snapshots/nvpool/ROOT/hipster_2018.04-20180503T141758Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-07-24-11:26:56
backup-adata/snapshots/nvpool/ROOT/hipster_2018.04-20180724T112647Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-11-13-09:32:46
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20181113T103249Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-02-13-01:22:00
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190129T091404Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-01-29-09:14:04
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190213T012200Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-03-22-08:56:38
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1      -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191003T110320Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191003T110320Z-nvm       -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1      -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z.x         -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191219T091024Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200113T161936Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200114T153553Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200114T153553Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200127T101549Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200127T101549Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200414T095506Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-2  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-3  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200425T140249Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200425T140249Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04                            -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200608T084252Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z           -

nvpool/ROOT                                                                   -
nvpool/ROOT/firefly_0215                                                      -
nvpool/ROOT/firefly_0215a                                                     nvpool/ROOT/firefly_0215@20180203-01
nvpool/ROOT/hipster_2016.10_mate                                              nvpool/ROOT/hipster_2020.04-20200622T165833Z@2016-11-09-23:38:21
nvpool/ROOT/hipster_2016.10_mate_drm-20170430T155411Z                         nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-05-03-04:49:41
nvpool/ROOT/hipster_2017.04                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-05-31-06:51:36
nvpool/ROOT/hipster_2017.04-20170903T231101Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-11-12-18:40:47
nvpool/ROOT/hipster_2017.10                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-12-08-11:46:57
nvpool/ROOT/hipster_2017.10-20171227T103659Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-12-30-10:46:38
nvpool/ROOT/hipster_2017.10-20171227T103659Z-nv                               nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-01-31-23:41:11
nvpool/ROOT/hipster_2017.10-20180203T155526Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-05-03-14:19:26
nvpool/ROOT/hipster_2018.04-20180503T141758Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-07-24-11:26:56
nvpool/ROOT/hipster_2018.04-20180724T112647Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-11-13-09:32:46
nvpool/ROOT/hipster_2018.10-20181113T103249Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-02-13-01:22:00
nvpool/ROOT/hipster_2018.10-20190129T091404Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-01-29-09:14:04
nvpool/ROOT/hipster_2018.10-20190213T012200Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-03-22-08:56:38
nvpool/ROOT/hipster_2018.10-20190322T085637Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-02
nvpool/ROOT/hipster_2018.10-20191003T110320Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-15:33:08
nvpool/ROOT/hipster_2018.10-20191003T110320Z-nvm                              nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-04-12:28:06
nvpool/ROOT/hipster_2018.10-20191004T122806Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-11-15-14:33:11
nvpool/ROOT/hipster_2019.10-20191115T133333Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26
nvpool/ROOT/hipster_2019.10-20191219T091024Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-14-15:35:53
nvpool/ROOT/hipster_2019.10-20200113T161936Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-13-16:19:36
nvpool/ROOT/hipster_2019.10-20200114T153553Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-27-10:15:58
nvpool/ROOT/hipster_2019.10-20200127T101549Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-14-09:55:32
nvpool/ROOT/hipster_2019.10-20200414T095506Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-16-05:44:41
nvpool/ROOT/hipster_2019.10-20200416T054436Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-25-14:02:51
nvpool/ROOT/hipster_2019.10-20200425T140249Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-27-06:29:42
nvpool/ROOT/hipster_2020.04                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-06-08-08:42:53
nvpool/ROOT/hipster_2020.04-20200608T084252Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-06-22-16:58:33
nvpool/ROOT/hipster_2020.04-20200622T165833Z                                  -
jimklimov commented 3 years ago

More data points: Created and activated a new beadm to check what happens... and also to update the destination pool with an inheritable history :)

The new one became the owner of rootfs history from the beginning of time, and the origin of snapshots for other rootfs clones including the recently active one:

root@jimoi:/root# zfs list -d1 -tfilesystem -o name,origin -r nvpool/ROOT
NAME                                            ORIGIN
...
nvpool/ROOT/hipster_2020.04                     nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-06-08-08:42:53
nvpool/ROOT/hipster_2020.04-20200608T084252Z    nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-06-22-16:58:33
nvpool/ROOT/hipster_2020.04-20200622T165833Z    nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-08-09-19:21:58
nvpool/ROOT/hipster_2020.04-20200809T192157Z    -

root@jimoi:/root# df -k /
Filesystem                                   1K-blocks   Used Available Use% Mounted on
nvpool/ROOT/hipster_2020.04-20200622T165833Z 114922085 543340 114378745   1% /
root@jimoi:/root# zfs list -d1 -tall -r nvpool/ROOT/hipster_2020.04-20200809T192157Z
NAME                                                                               USED  AVAIL  REFER  MOUNTPOINT
nvpool/ROOT/hipster_2020.04-20200809T192157Z                                      85.4G   109G   525M  /
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-01                           62K      -   266M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-02                           51K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-03                           54K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2014-04-16-15:50:45                    55K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-01                        196M      -   270M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@1                                     663K      -   284M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20140418-01                           664K      -   284M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20140425-01                           207M      -   287M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-20140803Z134336          93.3M      -   327M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20150106-01                          93.7M      -   327M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-20150115Z201009           231M      -   328M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20151212T193923Z   219M      -   426M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2015-12-17-13:37:28                   125M      -   426M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20160118T170356Z  86.6M      -   426M  -
...
nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2020-07-30T18:25:39Z    344K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2020-08-08T00:38:26Z    324K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-08-09-19:21:58                   237K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20200809T194524Z   101K      -   526M  -

Similarly for zone roots, with the complication that they only cloned the zbe dataset and probably its children if any (the ROOT holding it remains unchanged), and the snapname/timestamp is unique to each zone root at the moment it was cloned to update:

root@jimoi:/root# zfs list -d1 -tsnapshot -r nvpool/zones/testdhcp/ROOT/zbe-36 | head
NAME                                                                   USED  AVAIL  REFER  MOUNTPOINT
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-02-02T09:30:00Z  24.5K      -    27K  -
nvpool/zones/testdhcp/ROOT/zbe-36@2019-02-13-01:22:21                  126K      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-02-14T09:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-03-21T00:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@2019-03-22-08:58:00                     0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@20190322-01                             0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-04-18T05:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-05-16T07:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-06-13T08:00:00Z      0      -   835M  -
...
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-23T09:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-23T10:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-30T18:31:27Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-08-08T00:38:11Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@2020-08-09-19:24:15                     0      -   835M  -

The latest (currently activated) ZBE sequentially numbered version is the origin for other clones of this zone root, except the numbers earlier removed (on source) with beadm destroy:

root@jimoi:/root# zfs get origin nvpool/zones/testdhcp/ROOT/zbe-3{0,1,2,3,4,5,6}
cannot open 'nvpool/zones/testdhcp/ROOT/zbe-30': dataset does not exist
cannot open 'nvpool/zones/testdhcp/ROOT/zbe-32': dataset does not exist
NAME                               PROPERTY  VALUE                                                  SOURCE
nvpool/zones/testdhcp/ROOT/zbe-31  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-04-27-06:30:10  -
nvpool/zones/testdhcp/ROOT/zbe-33  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-06-08-08:44:04  -
nvpool/zones/testdhcp/ROOT/zbe-34  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-06-22-16:59:26  -
nvpool/zones/testdhcp/ROOT/zbe-35  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-08-09-19:24:15  -
nvpool/zones/testdhcp/ROOT/zbe-36  origin    -                                                      -
jimklimov commented 3 years ago

For the setup of datasets and snapshot names elaborated above, a direct attempt to send new dataset name as an increment from the old seems to work, ZFS discovers where it needs to clone and append automatically, at least for a replication stream (which is what I want, but not necessarily what znapzend users might want if they wish to e.g. disable some child datasets from replication... not sure if that is currently possible, but is a constraint against zfs send -R mode):

root@jimoi:/root# zfs send -R -I nvpool/ROOT/hipster_2020.04-20200809T192157Z@{20190830-01,postupgrade_pkgips-20200809T194524Z} | mbuffer -m 1G | zfs recv -vue backup-adata/snapshots/nvpool/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s, 2220 KiB total, buffer   0% full
cannot open 'backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200809T192157Z': dataset does not exist
in @ 24.9 MiB/s, out @  0.0 KiB/s, 2240 KiB total, buffer  35% full
in @  0.0 KiB/s, out @  0.0 KiB/s, 2240 KiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-09-10T11:30:00Z
 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@znapzend-auto-2019-09-10T11:30:00Z
in @  0.0 KiB/s, out @  0.0 KiB/s,  122 MiB total, buffer 100% full
received 120MB stream in 115 seconds (1.04MB/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  122 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@20190910-01 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@20190910-01
in @  0.0 KiB/s, out @  0.0 KiB/s,  241 MiB total, buffer 100% full
received 119MB stream in 74 seconds (1.61MB/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  241 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@20190910-02 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@20190910-02
in @ 1807 KiB/s, out @  0.0 KiB/s,  257 MiB total, buffer 100% full
received 15.4MB stream in 63 seconds (251KB/sec)
...

I guess the laptop now has a long interesting night ahead...

UPDATE: Cool, it can even recognize the increments already present in destination pool, though probably part of some other clone's history currently:

in @  0.0 KiB/s, out @  0.0 KiB/s, 1556 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-10-15T03:56:25Z
    into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-15T03:56:25Z
snap backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-15T03:56:25Z 
    already exists; ignoring
received 0B stream in 45 seconds (0B/sec)

in @  0.0 KiB/s, out @  0.0 KiB/s, 1571 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-10-19T10:00:00Z
    into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-19T10:00:00Z
snap backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-19T10:00:00Z
    already exists; ignoring
received 0B stream in 43 seconds (0B/sec)

And that is a bit I hate about delays in zfs recv - seen even in a single-command mode (may be worse when everything is in separate commands, so e.g. mbuffers are not as well utilized to fill up on one side while another "thinks" when there are many single-use mbuffer processes):

received 312B stream in 62 seconds (5B/sec)
vs.
received 15.4MB stream in 63 seconds (251KB/sec)
vs.
received 394MB stream in 82 seconds (4.81MB/sec)

I think every receive takes a minute procrastinating and then a few seconds doing I/O. Haven't seen any snapshot increment today that would clock under 60s. Grrr... Looking at the consoles, I see it doing zero active I/O on source and destination pools (zpool iostat 1), then the zfs send|zfs recv pipe says "receiving incremental stream of X into Y" and there is a burst of I/O for a second, and the pipe logs how it received a few kilobytes in a lot of seconds...

UPDATE: It "sped up", now the base time taken for an increment to be received is 42 seconds or so. I remain puzzled.

jimklimov commented 3 years ago

So in the end the dried-up routine I've done in shell for the rootfs and zoneroots was:

mbuffer: error: outputThread: error writing to at offset 0x5e000: Broken pipe

summary: 378 KiByte in 2min 15.7sec - average of 2.8 KiB/s mbuffer: warning: error during output to : Broken pipe

but requesting a re-send starting with that snapshot name seems to work, e.g.:

:; zfs send -R -I nvpool/zones/omni151018/ROOT/zbe-60@{20190322-01,2020-08-09-19:22:53} \ | mbuffer -m 128M | zfs recv -vue backup-adata/snapshots/nvpool/zones/omni151018/ROOT

in @ 0.0 KiB/s, out @ 0.0 KiB/s, 300 KiB total, buffer 0% full cannot open 'backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-60': dataset does not exist in @ 0.0 KiB/s, out @ 0.0 KiB/s, 324 KiB total, buffer 2% full


this probably impacts snapshot histories where same-named (but different) snapshots have been made and replicated over time. Note that the clone for "zbe-86" still was not made in the example above. Possibly I'd have to branch it off under such name an earlier snapshot on destination (common with source), will see soon... so far it is quiet for a long time...

TODO: When it is all done, check that histories of existing destination datasets remain in place and the new zoneroot did get branched off of an older snapshot. Otherwise maybe making the clone from an ancient point and receiving increments into it more explicitly is the proper way (recursively with children somehow then)?..
jimklimov commented 3 years ago

At least, running a dozen operations like this in parallel keeps the destination pool fairly busy. Although each zfs send|zfs recv pipe still lags for quite a while between actual writes (though... a different "while" - for zoneroots the minimum overhead seems to be about 10 sec) someone has something to say almost every second so zpool iostat 1 looks less spikey.

jimklimov commented 3 years ago

Clone and send did not go too well...

root@jimoi:/root# zfs clone backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-56@2019-01-29-09:14:11 backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86
root@jimoi:/root# zfs send -R -I nvpool/zones/omnibld/ROOT/zbe-86@{2019-01-29-09:14:11,2020-08-09-19:22:41} | mbuffer -m 128M | zfs recv -vue backup-adata/snapshots/nvpool/zones/omnibld/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s,  304 KiB total, buffer   0% fullreceiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@2019-02-13-01:22:06 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-02-13-01:22:06
cannot receive incremental stream: most recent snapshot of backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86 does not
match incremental source
mbuffer: error: outputThread: error writing to <stdout> at offset 0x51000: Broken pipe

summary:  324 KiByte in 30.3sec - average of 10.7 KiB/s
mbuffer: warning: error during output to <stdout>: Broken pipe

Maybe it should go on with what it did, without a named destination until the stream from zfs send -R makes it, but hop over the "offending" increments...

UPDATE: zfs promote backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86 helped, it seems. At least this is chugging along again, and included the correct version of that somehow offending snapshot:

receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02
in @  248 KiB/s, out @  0.0 KiB/s,  448 KiB total, buffer   0% full
snap backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02 already exists; ignoring
received 0B stream in 1 seconds (0B/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  448 KiB total, buffer   0% full

### this did not pass earlier
receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@20190322-01 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@20190322-01
in @ 15.4 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   1% full
in @  504 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   1% full
received 124KB stream in 15 seconds (8.27KB/sec)

receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@znapzend-auto-2019-04-18T05:30:00Z into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@znapzend-auto-2019-04-18T05:30:00Z
in @  0.0 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   2% full
received 312B stream in 9 seconds (34B/sec)
...
jimklimov commented 3 years ago

Update: the long lags issue raised on illumos IRC; a viable theory is that as my source and destination pools have quite a few datasets (me being me) and hordes of snapshots (largely thanks to znapzend, and to my desire to roll back into considerable past if needed), overall on the order of 1k datasets and 100k snaps, it seems that zfs recv spends a lot of time to iterate (recursively!) and find perhaps a guid of dataset that would match or not something in the received stream; pstack at https://pastebin.com/ctU1kLse and it takes arguably too much time given that all the ZFS data is in cache and kernel, and no disk I/O has to happen...

jimklimov commented 3 years ago

Regarding zfs promote : seems it's required on destination.

The benefit for situations like what my backup pool found itself in is that zfs send -R -I ...| zfs recv -e pipe guesses the dataset an increment of snapshot belongs to. Increments are tracked apparently by their previous snapshot's guids. So when znapzend (unaware about how to handle promoted snapshots) made them from scratch, I've got many short history owners where the snapshots received now landed - to one or another ZBE, instead of becoming a history of the new dataset name to be made. A made (from old newest-common snap of a history owner dataset) and promoted destination dataset does seem to solve this, and the new history owner gets all incoming incremental snapshots.

jimklimov commented 3 years ago

One more bit of wizdom from the trenches: not always it happens so that a top-level rootfs snapshot common between source and backup pools exists same-named on both sides for child datasets (e.g. in my systems I have split-off /usr, /var and some others). They usually are there for snapshots transferred earlier with a replication stream, and may be spotty after znapzend took over.

So care has to be taken to choose a consistent set of snapshots for the subtree to pose as a new rootfs name, and then clone and promote each one individually (zfs tool does not offer recursion for those operations). And then zfs send -R ... newrootfsname@lastsnap | zfs recv -vue newrootfsname can be performed to receive the newrootfsname and its children to the backup.

Oh, did I mention it can take days with sufficiently many snapshots involved (might want to script only hopping between milestones relevant for other rootfs'es to branch off to remake them as inheriting parts of the same history and ignoring older zfs-auto-snaps), and that to avoid confusion nothing has to appear in these datasets so for the time being znapzend service better stay off, at least not touching this tree (should be easy to do with "enabled" toggle in the policy definition).

jimklimov commented 3 years ago

The IRC discussion of the slow zfs recv for such situation provided a few insights:

jimklimov commented 3 years ago

At least, after cloning and promoting target datasets first, and sending a replication stream afterwards, I confirmed this cut the lag between received snapshots from about a minute to about 5 sec - much more bearable!

oetiker commented 3 years ago

something to add to the documentation maybe ?

jimklimov commented 3 years ago

At least, I suppose...

I'm locally PoC'ing a (shell) script that would help me automate this chore, it is currently chewing through my zone roots' older branches (I cloned and promoted new "history owners" manually like in later posts above).

I plan to post that as a contrib type of thing that solves a narrow purpose under right conditions which can be a pretty common usecase still (rolling backups of rootfs and zoneroots) and as we learn more edge cases (e.g. that we should not pre-clone a destination branch that is to be cloned from an existing snapshot by zfs recv), the logic can be rolled into znapzend. So far most of the script are comments about logic and data samples, for maybe two dozen actual code lines ;)

Jim

On Tue, Aug 11, 2020, 13:28 Tobias Oetiker notifications@github.com wrote:

something to add to the documentation maybe ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oetiker/znapzend/issues/503#issuecomment-671889812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFFQOYH4SRPPOGG3S7DSAETMVANCNFSM4PP4UYDA .

jimklimov commented 3 years ago

Noted in documentation with #512 and also posted my shell script that addresses part of the problem there as a contrib/ directory content.

jimklimov commented 3 years ago

A bit of good news: there seems to be no special handling required for catching up a cloned rootfs or zbe with child datasets that is NOT a tip to be zfs promote'd: in any case, for boot environments and zones, the original tool (e.g. beadm) responsible for original snapshot+clone seems to have made a recursive snapshot, so the snap in parent BE root and its child datasets are same-named. It wouldn't hurt to test for this in the script pedantically (or for other use-cases) so that we branch the backup from a snapshot common to the whole bunch, but seems redundant in this practical case. So letting the same contributed script pick through my rootfs backups now...

jimklimov commented 3 years ago

Yet another data point: on Solaris 10u10, while zfs send -R clone@snap | zfs recv does detect the "clone origin" on its own and attaches the received increments as a branch off of the existing tree, I can not explicitly send an incremental snapshot that spans the two datasets - using the two points (origin and oldest individual snapshot) seen by zfs in the earlier command - in this case it just aborts with "incremental source must be in same filesystem" message.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.