stevenshiau / clonezilla

Clonezilla is a partition or disk clone tool similar to Norton Ghost®. It saves and restores only used blocks in hard drive. Two types of Clonezilla are available, Clonezilla live and Clonezilla SE (Server Edition).
GNU General Public License v2.0
594 stars 93 forks source link

LUKS-encryption with LVM causes the disk-cloning to abort with error/warning: "The volume group setting is not found" - it should be ignored #113

Closed marfeljoergsen closed 1 week ago

marfeljoergsen commented 1 month ago

Hi,

I can see a some stuff has been made for implementing support for LUKS-encryption. Since I normally make my backups to unencrypted disks, I'm not going to use that feature. I just downloaded the stable clonezilla ISO today (stable - 3.1.3-16) and forgive me if I didn't find the correct issue, I tried to search but will write my findings here instead.

My setup (probably a bit unusual):

I have an encrypted boot-loader and one or two luks-partitions where at least 1 is LUKS2-encrypted with plausible deniability. It works such that my encrypted boot-loader automatically unlocks this special partition. But it also means this partition looks like it contains completely random data - as if it was/is unused/unformatted.

The problem:

I've tried several times to make a full disk cloned image, but clonezilla won't backup or clone the last two partitions I have. It won't allow me to override this behaviour - instead it comes up with warnings and errors: _"/NOTFOUND /dev/nvme0n1p9 (.....)" " + "///WARNING/// The volume group setting is not found" + "Unable to save LVM image", seemingly because there is a "lvm" flag set for the partition AND it cannot find the corresponding partition (because it haven't been unlocked). So clonezilla thinks my partitions are corrupted and screwed (which they aren't if I unlock them properly). The output of "lvs" and "vgs" shows nothing. The "pvs" command shows nvme0n1p9 has "Fmt=lvm2" and the size is 909g.

My understanding:

I'm firmly believe this problem is happening because at least one encrypted drive is LUKS2-encrypted and I didn't unlock it - which I'm guessing is also not possible at the moment, because: In order to unlock it properly, in addition to the passphrase, it's also necessary to provide the 4096-byte LUKS detached header-file. I'm guessing I can only supply a passphrase in the current clonezilla-version (although I didn't try it).

What I tried (and did):

I was looking for way to "--force" so the cloning would complete despite the LVM-errors, but even when using the "advanced" options, I couldn't make it work (and I couldn't find the --force option, I suggest please put it back). Eventually I guessed (and still believe) that it would've worked if I used an older clonezilla version, which doesn't understand LUKS-encryption, so it would default to using "dd". Because I'm a bit in a hurry I took the easy solution which is:

# partclone.dd -s /dev/nvme0n1p9 -o nvme0n1p9.dd -L n1p9.log

And I'm running this now. I think the current clonezilla version is missing the "--force" option, so it ignores what it thinks is LVM-corrupted partitions... I've been using clonezilla for +10 years so I want it to be as useful and good as always and hope this report can help improving the software, thanks for great work!

stevenshiau commented 1 month ago

Thanks for your feedback. If you can reproduce this issue in a VM which does not contain any classified data, please share the VM with us so that it's easier for us to reproduce this issue.

marfeljoergsen commented 1 month ago

Yes, I think I know how to do a similar setup that I can share, which will aid implementation and testing. I think I should've been more precise and written that I belive I'm running with LVM on LUKS such that I can unlock several volumes using a single passphrase and keyfile. To quote:

Hence, the LVM is not visible until the block device is unlocked and the underlying volume structure is scanned and mounted.

So I think this explains the error and why I had to resort to running partclone.dd directly. As mentioned I also used LUKS2 with detached header: sudo cryptsetup luksFormat /dev/sdb --header luksheader.img (and also with a keyfile) so I'm guessing I can reproduce the behaviour in a VM, I just need to get back to this later this month, hopefully within around 2 weeks as I'm a bit busy. I'll try to create a minimal example in a container and write an update later this month, thanks.

marfeljoergsen commented 3 weeks ago

Hi Steven. I think it was relatively easy for me to reproduce (apologize the delay, I've been busy with a new job). Although I don't 100% get the same error message I believe it's the same problem. The error message I now get is:

///WARNING/// The LVM physical volume setting was not found" + "Unable to save LVM image"

I'll share this zip-file (40 MB) which needs Virtualbox: LINK-TO-ZIP-FILE (it's not running Alpine or any other Linux but with a more advanced setup with LUKS1-enabled GRUB-bootloader and LUKS2-header-file on that partition, it could be made so it automatically decrypts the second LUKS-partition and I have exactly that setup on some machines - it is however not needed here, for this simple demonstration of the problem). NB: Don't choose "Import" - instead choose Machine -> +Add" (import won't work).

Instructions:

Inside Virtualbox, I created two disks (Alpine Linux.vdi and Alpine Linux_CLONEZILLA...something.vdi): The Alpine Linux.vdi has 5 partitions that more or less partition-wise looks like many of my machines with EFI-partition, NTFS/Windows-partitions and then something encrypted with Linux, very often I use LVM on top of LUKS. The other .vdi file (with just a single Linux-partition) is supposed to be the destination for the full disk clonezilla-image. I'm using "clonezilla-live-3.1.3-16-amd64.iso" if you want a 100% similar setup. Just boot up the VM. Once booted up, you should be able to see the two virtual disks as/dev/sd{a,b}. For your reference, I included the log-file in the simplest of the virtual disks. Here you can see the exact command I ran and the error message above. You should be able to reproduce the problem by running the same command... I also include a modified "history"-file so you can see which commands I ran to create this LUKS2-encrypted LVM-partition. The easiest way to see the problem is by running:

ocs-sr saveparts test.img sda5

As mentioned again, although I don't have the exact same error message this time I believe the problem is the same and that is that I believe that Clonezilla tries to see if the LVM-marked partition is "valid" which it cannot see unless the LUKS2-encrypted LVM-partition is unlocked and that both require the LUKS2-header file and the LUKS2-passprase (which is just "123" in case you want to decrypt the partition and look inside). I don't think Clonezilla in any case currently supports providing an external LUKS2-header file but it doesn't matter for me because in this case I don't want to unlock the LUKS2-encrypted LVM-partition before making the image. I need some kind of "--force" or "--ignore-errors" option so clonezilla won't do any validation but just perform a simple "dd"-copy and it doesn't even offer me the chance to do a simple "dd"-copy here. I hope I didn't misunderstand anything or didn't use an existing option that already solves this... Please let me know if there are any questions and I'll happily (and should be able to) reply relatively quick. Thanks!

UPDATE 1:

I've been playing with it tonight and debugged a bit (it's not too complicated) - WARNING, I've also replaced the zip-file, with another version (because I made a mount-bind mistake, it's fixed now). I think I can see what's going on and I think I understand the problem so I can better and more accurately describe it now. This is what I think happens:

  1. When you don't "unlock" (=cryptsetup open) the LVM-partition, the commands "pvs", "vgs" and "lvs" (and pvscan I think) will fail and show nothing as if there's no LVM present. This is incorrect because LVM is present, but hidden behind LUKS2-encryption... (I encourage you to try it yourself)...
  2. If you want the LVM-commands "pvs", "vgs" and "lvs" + "pvscan" to work, you need to: cryptsetup open --header cryptLVM_header.img /dev/sda5 encryptedLVM - (remember passphrase "123") - this will "unlock" the encrypted data and make it available for LVM to see it.
  3. If you now repeat "pvs", "vgs" and "lvs" + "pvscan", they'll work correctly.
  4. If you want to play with it and test what happens with clonezilla, when the LVM partition is encrypted or decrypted you can de-activate the LVM again by using: vgchange -an and then close the encrypted partition: cryptsetup close /dev/mapper/encryptedLVM. Again, if you repeat "pvs", "vgs" and "lvs", they'll work correctly but this time NOT show anything.
  5. Here's the culprit: Line 7018 (pvscan --config) in the file scripts/sbin/ocs-functions - this line assumes that pvscan has access to all partitions, but it does not have that unless we've decrypted /dev/sda5.
  6. In line 7039 the line says echo ... | tee -a but that output is blank in this case. And then the errors in line 7046 and/or perhaps 7055 will be printed.

Perhaps this is my own mistake because /dev/sda5 is marked as an LVM-partition. In some way it's correct, but it's an ENCRYPTED LVM-partition. The real partition belonging to the LVM is in fact the decrypted /dev/mapper/encryptedLVM. So is this my own mistake, because the /dev/sda5 LVM-partition is marked as it is? I prefer to mark it this way... In any case, I expect something is wrong about the "return 1" lines, because the script should revert to using "partclone.dd" and the script doesn't revert to using partclone.dd (which I ended up running manually as described initially). I guess everything boils down to how to change the behaviour so partclone.dd is being run when the situation arises, that "pvscan" returns nothing?

UPDATE 2:

I've been looking more at it today after work and it seems the newer LUKS-functionality is not really written for LUKS2 which I use. For instance line 9332 in ocs-functions (if cryptsetup isLuks -- /dev/...) only works for a simple LUKS-encrypted partition and it is also incorrect, for my LUKS2-encrypted setup (which require a detached header-argument)... A fix to this requires extra arguments so it's more complicated, probably require some historical background information that I don't have and I won't go into attempting to look at that now.

Conclusion

I've created a pull request and I believe this fixes my issue without any negative side-effects to consider (at least I hope, fingers crossed :laughing:). I hope you agree, otherwise please let me know what you think. If you need help on other LUKS2-functionality I might be able to spend a bit of time on it because these scripts aren't actually too difficult for me to understand and I really appreciate Clonezilla as a valuable tool and have used it for many years :-)

marfeljoergsen commented 1 week ago

I believe the issue has been fixed with the pull request.