wesbarnett / snap-pac

Pacman hooks that use snapper to create pre/post btrfs snapshots like openSUSE's YaST
GNU General Public License v2.0
183 stars 14 forks source link

Move snap-pac pre hook back some numbers #45

Closed Faerbit closed 3 years ago

Faerbit commented 3 years ago

Describe the bug Impossible to run any hooks before snap-pac pre hook, since it is prefixed with 00.

To Reproduce Steps to reproduce the behavior:

  1. Install snap-pac
  2. Be confused

Expected behavior It is possible to run a hook before snap-pac pre hook, since it is prefixed with e.g. 05

Additional context I'm trying to backup my /boot partition as described here. I do it like this since it is a non-btrfs filesystem.

As it stands currently restoring to a snapshot is quite confusing, since snapshot and /boot backup do not fit together.

wesbarnett commented 3 years ago

Why would the order of when the backup and the pre snapshot matter? Can you describe exactly what the confusion is?

The logic of putting the pre snapshot as the very first one is that other hooks will make modifications to the filesystem as part of the upgrade process. Thus, if the snapshot is after those other hooks then they cannot be reverted using snapper rollback.

Faerbit commented 3 years ago

Can you describe exactly what the confusion is?

Gladly :) Consider the following sequence:

Which results in the following situation:

Snapshot Contained kernel version of /boot
1 (pre) unkown
2 (post) 5.9
3 (pre) 5.9
4 (post) 5.10

As you can see the /boot backup always is one step behind, which is the result of the ordering of the snap-pac pre snapshot and the /boot backup. If I want to restore my system to the state before the second upgrade, since it was faulty, I have to restore my / partition to snapshot 3 but use the /boot backup from snapshot 4. I'm not sure that I have explained well, what is happening, but I can't find better words to describe it right now. Let me know, if it is still unclear.

The state of things isn't that bad, once you know how it works, but since I didn't expect this, this totally threw me off on Friday, while trying to restore my system to a previous state. What makes this especially problematic, is that the kernel modules are stored on the / partition and therefore the kernel version from /boot and the kernel version from the modules do no match up and the system totally refuses to boot (in constrast to some partially upgraded system state, which is bad, but at least boots).

I could move the /boot backup after the post snapshot, but this wouldn't pickup changes, to e.g. the initramfs, made in between upgrades. Another approach is to do the /boot backup based on timers/cron. But this is a race condition waiting to happen.

The logic of putting the pre snapshot as the very first one is that other hooks will make modifications to the filesystem as part of the upgrade process.

I suspected as much. But I hope that I could outline why the current situation is not ideal for a setup like mine (which seems not that uncommon to me). Moving the pre hook to 01 would be enough for me, but since other people might want to do additional other steps like this one, I suggested 05 in my OP.

wesbarnett commented 3 years ago

As you can see the /boot backup always is one step behind, which is the result of the ordering of the snap-pac pre snapshot and the /boot backup. If I want to restore my system to the state before the second upgrade, since it was faulty, I have to restore my / partition to snapshot 3 but use the /boot backup from snapshot 4. I'm not sure that I have explained well, what is happening, but I can't find better words to describe it right now. Let me know, if it is still unclear.

Both the back-up of /boot and the pre snapshot are (for practical purposes) occurring the same time - they are capturing the same state of the system, which is before any upgrades occur. I think you are associating your backup with snapshot 4, but it is really associated with snapshot 3. It doesn't have to be before or after the snapshot since the snapshot actually doesn't snapshot /boot and both occur before the upgrade. They're completely separate processes that don't affect each other.

wesbarnett commented 3 years ago

Can you describe exactly what the confusion is?

Gladly :) Consider the following sequence:

* Kernel has version 5.9

* system upgrade incl. Kernel

  * snap-pac pre snapshot 1
  * `/boot` backup
  * system upgrade itself
  * snap-pac post snapshot 2

* Kernel has version 5.10

* system upgrade incl. Kernel

  * snap-pac pre snapshot 3
  * `/boot` backup
  * system upgrade itself
  * snap-pac post snapshot 4

* Kernel has version 5.11

Which results in the following situation: Snapshot Contained kernel version of /boot 1 (pre) unkown 2 (post) 5.9 3 (pre) 5.9 4 (post) 5.10

This last table is not right. Here is what the snapshots contain as far as the kernel version:

1 (pre)    5.9 
2 (post)   5.10
3 (pre)    5.10
4 (post)   5.11

So in the first upgrade, your /boot backup would be associated with kernel version 5.9. Then in the second transaction the /boot backup would be associated with version 5.10. The current running system would be 5.11.

Faerbit commented 3 years ago

Both the back-up of /boot and the pre snapshot are (for practical purposes) occurring the same time

The ordering of these two the only thing I am talking about. Stating, that they happen at the same time, incorrectly simplifies the problem. If viewed like that the problem does not exist, but in reality it does. Assuming my understanding of the situation is correct.

I think you are associating your backup with snapshot 4, but it is really associated with snapshot 3. [...] This last table is not right. Here is what the snapshots contain as far as the kernel version:

This is what I would expect/wish would happen, but this is not what is happening. And with the current state of things am unable to procure. Hence why I opened this issue.

They're completely separate processes that don't affect each other.

I'm not sure we are talking about the same thing. Again, the interaction between these two is the entire point.

This is how it works, as far as I understand it: The snapshots of / contains the backup of /boot (which stored under /.bootbackup, the concrete name is of course not important) but does not contain /boot itself, since that is stored on another partition with a non-btrfs filesystem (vfat in my case to facilitate UEFI, but that details is also not relevant to the problem).

In case this wasn't clear, when I said "restoring to a snapshot" this (essentially) contained two steps: 1) snapper rollback 2) cp /.bootbackup /boot

At the point in time / is snapshotted, /.bootbackup is snapshotted as well. But since the backup from /boot to /.bootbackup happens after the snapshot, the state in /.bootbackup is not the current system state, but rather some older state (from the last pre snapshot).

Expanded table with more details:

Snapshot Kernel version on /boot at the time of snapshot Kernel version on /.bootbackup Kernel version on /
1 (pre) 5.9 unkown 5.9
2 (post) 5.10 5.9 5.10
3 (pre) 5.10 5.9 5.10
4 (post) 5.11 5.10 5.11
wesbarnett commented 3 years ago

I think I now understand that I was thinking of this differently then you were. When I was talking about the boot partition being backed-up, I was talking about how it is backed up to the currently running system. In other words, after you do a kernel upgrade, the currently running system has a folder named /.bootbackup containing the previous kernel.

It seems like you are talking about what the contents of /.bootbackup inside the actual snapshot that is created using snap-pac.

So, the way to restore a backup with the backup hook running first would be simply to rollback the changes and then use rsync to restore the /boot partition from the restored snapshot. The way I had been thinking of it was simply restoring the /boot partition from the currently running system and then rolling back the changes, but obviously that may not work if the currently running system has an issue (and hence the desire to rollback).

Totally makes sense now. It just took me a little longer to get there.

That said, you can currently prefix your hook with 000 to make it run before a hook prefixed with 00_.

Faerbit commented 3 years ago

It seems we talked about entirely different things :)

Thanks for your help!