Some questions - Githubissues

veeam / blksnap

Nonpersistent block device snapshot with block-level change-tracking capabilities.

GNU General Public License v2.0

84 stars 22 forks source link

Some questions #57

Open Fantu opened 1 year ago

Fantu commented 1 year ago

Hi, before I did only very fast "superficial" tests, today I started to do some more deep testing, I'll post here some questions based on tests I did.

The diff storage contain any difference without check if the "chunk" of same position of device was already present on same snapshot and override it right? (I mean if after snapshot_take same device part is "overwritten" more times is stored any time) I not checked good in kernel patch but from what seems from a look look seems.

Can seems strange but I tested snapshot also on btrfs, I think blksnap can be useful also with that in some cases, trying to mount the blksnap snapshot fails (seems for check that do for multiple device support):

sudo mount /dev/blksnap-image_8\:10 /mnt/tmp
mount: /mnt/tmp: mount(2) system call failed: File già esistente.
       dmesg(1) may have more information after failed mount system call.

2023-04-25T20:05:54.286507+02:00 sid-fantu kernel: [24366.185673] BTRFS warning: duplicate device /dev/blksnap-image_8:10 devid 1 generation 524 scanned by mount (8639)

Is there a safe way to mount blksnap snapshot done on btrfs block device?

Thanks for any reply and sorry for my bad english.

SergeiShtepa commented 1 year ago

Hi.

I think it's not worth trying to mount btrfs from a snapshot. It may make sense to make a copy from the snapshot, then detach the original btrfs. Perhaps then it will be possible to mount a copy. But I haven't tried.
The difference storage should also not be placed on btrfs. For btrfs, the FIEMAP ioctl returns virtual offsets, not physical ones. They still need to be converted. This has not been implemented in this project yet.

Fantu commented 1 year ago

@SergeiShtepa thanks for reply

about "make a copy from the snapshot" what you mean? copy the entire block device instead data mount the snapshot? the cases which I have in mind where it would be useful would instead be mainly copying files from the mounted snapshot. Can seems strange but probably backup scripts with only blksnap instead an additional code for support also btrfs snapshot and/or avoid in long time "waste" of space and performance (mainly on hdd) caused by snapshot that can require a long balance to "solve" and that can fail if low free space is present
I read that and in all test I did I had put difference storage in ext4 partition (also the snapshot test done on btrfs). What I mean in first question above was relating to how copy to difference storage, from a fast look to code probably already have the answer but I would prefer a confirmation. Sorry for my bad english and explain, I retry: actually any write to block device after the snapshot create is copied to difference storage (excluding the difference storage if in the same block device) without check if same part was already overrided multiple time, right?

SergeiShtepa commented 1 year ago

I apologize for the late response.

about "make a copy from the snapshot" what you mean? copy the entire block device instead data mount the snapshot?

I mean duplicating the entire block device with btrfs. In its simplest form, it can be a "dd" call. I don't see any other ways to use btrfs together with btrfs. And if we remember that btrfs can be located on several devices, then the desire to invent something disappears. I think, if someone wants to use btrfs, then let them use their own snapshots.

actually any write to block device after the snapshot create is copied to difference storage (excluding the difference storage if in the same block device) without check if same part was already overrided multiple time

No, copying is done only once. A chunk is copied to the difference storage once for the entire chunk. So, the users of the Round Robin Database should be happy.

Fantu commented 1 year ago

@SergeiShtepa thanks for reply

No, copying is done only once. A chunk is copied to the difference storage once for the entire chunk. So, the users of the Round Robin Database should be happy.

sorry if I still have a doubt, probably I not explained good or I should test better or ext4 write to different position more like cow filesystem like btrfs (and I don't remember good), because overriding the same files I had the difference storage that seems continue to increase at the "full" size that write even if probably with the same position (or at least partially). If a snapshot is taken and after the same part of the block device is overriden more times, the difference storage contain chunk of the same position for any time was overrided or only the latest?

EDIT: did another test related to this:

ID=$(sudo blksnap snapshot_create --device /dev/sda3) && echo "New snapshot ${ID} was created"
sudo blksnap stretch_snapshot --id=${ID} --path=/media/fabio/TEST-BLKSNAP/blksnap-tests/diff-storage/ --limit=1024 &
sudo blksnap snapshot_take --id=${ID}
sudo mount /dev/sda3 /mnt/tmp

df -h
File system     Dim. Usati Dispon. Uso% Montato su
...
/dev/sda3       975M   24K    908M   1% /mnt/tmp

sudo dd if=/dev/urandom of=/mnt/tmp/test.img bs=1M count=900
sudo dd if=/dev/urandom of=/mnt/tmp/test.img bs=1M count=900
sudo dd if=/dev/urandom of=/mnt/tmp/test.img bs=1M count=900
sudo dd if=/dev/urandom of=/mnt/tmp/test.img bs=1M count=900
sudo dd if=/dev/urandom of=/mnt/tmp/test.img bs=1M count=900

sudo ls -alh /media/fabio/TEST-BLKSNAP/blksnap-tests/diff-storage/
...
-rw------- 1 root root 1,0G 30 apr 21.26 diff_storage#0
-rw------- 1 root root 1,0G 30 apr 21.28 diff_storage#1

didn't created 5 diff storage of 1gb so my doubt was wrong

Fantu commented 1 year ago

@SergeiShtepa I did other tests and when I tried to delete diff storage files before snapshot destroy and I not found error or other logs related to this (in kern.log) and also stretch service seems didn't manage this unexpected cases

EDIT: saw that Donald Buczek already spot the same risk also with "snapshot_appendstorage" in https://github.com/veeam/blksnap/issues/53 document it don't seems enough for me, from what I saw the range of space of the diff storage files is passed to blksnap and I don't know a possibility to know in blksnap part in kernel if happen thing like this that diff storage is compromise, or I'm wrong? on the tool side improvement like lock as wrote by Donald seems possible to decrease the cases of this issue even if can't fully avoid it. monitor the diff storage files in the stretch service and when they are missing at least mark the corrupted snapshot perhaps it could be a further addition that mitigates the problem? Cases of snapshot compromised without know of it seems very relevant to me. if you do operations on the compromised filesystem (of the snapshot) maybe you see the consequences in error while in the case of a simple copy of the block device (like dd) as far as I know it is less likely

SergeiShtepa commented 1 year ago

Hi!

Of course, the problem of unexpected deletion of difference storage files exists. The module cannot use these files using the file system. Files are just one way to reserve an area of disk space. The module works with disk areas directly .

monitor the diff storage files in the stretch service and when they are missing at least mark the corrupted snapshot

Alas, this is not the solution to the problem. If the file was deleted while holding the snapshot, then the file system may already be damaged. This should not be allowed. And the file can be overwritten by someone, or simply transferred during defragmentation. We can't control that.

The only way to prevent the file from being deleted or moved is to set the immutable flag on it (FS_IMMUTABLE_FL). I plan to add this code. If there are any other ideas about this, I will be glad to know them. Hmm... Probably we need to see how such a problem is solved for loop devices.

But on the other hand, nothing prevents from reserving a disk area otherwise, for example, a partition for difference storage may be allocated. The blksnap tool supports this. This is a safer, but less user-friendly way.

P.S.: And it is also interesting what happens if the difference storage is located on a loop device. And his file will be located on a block device under the snapshot. Here it seems such a case will be inoperable. There will be an overflow of the snapshot, I suppose.

SergeiShtepa commented 1 year ago

If a snapshot is taken and after the same part of the block device is overriden more times, the difference storage contain chunk of the same position for any time was overrided or only the latest?

The chunk will store in the difference storage only the data that was on the disk before the snapshot was created. Only this data is needed for the snapshot image. If the data is repeatedly overwritten while the snapshot is being held, the module simply skips these requests. See diff_area_copy().

Fantu commented 1 year ago

Of course, the problem of unexpected deletion of difference storage files exists. The module cannot use these files using the file system. Files are just one way to reserve an area of disk space. The module works with disk areas directly .

I saw this and related to this as wrote I don't have idea how can be possible protected from this on kernel side, probably is not possible. Probably can be good write on patch serie mail asking if someone have an idea.

monitor the diff storage files in the stretch service and when they are missing at least mark the corrupted snapshot

Alas, this is not the solution to the problem. If the file was deleted while holding the snapshot, then the file system may already be damaged. This should not be allowed. And the file can be overwritten by someone, or simply transferred during defragmentation. We can't control that.

The only way to prevent the file from being deleted or moved is to set the immutable flag on it (FS_IMMUTABLE_FL). I plan to add this code. If there are any other ideas about this, I will be glad to know them. Hmm... Probably we need to see how such a problem is solved for loop devices.

But on the other hand, nothing prevents from reserving a disk area otherwise, for example, a partition for difference storage may be allocated. The blksnap tool supports this. This is a safer, but less user-friendly way.

May be useful asking on mail of patch serie also if someone have better idea to solve/improve it on tool side.

P.S.: And it is also interesting what happens if the difference storage is located on a loop device. And his file will be located on a block device under the snapshot. Here it seems such a case will be inoperable. There will be an overflow of the snapshot, I suppose.

yes should be tested (is not on tests I did for now) for know if improvement is needed or if not possible have it working document it (for example in Documentation/block/blksnap.rst#difference-storage)

buczek commented 1 year ago

Sorry, late to the party.

Probably we need to see how such a problem is solved for loop devices.

Userspace tool opens the file and passes the file descriptor to the kernel 1.
The kernel uses fget() on the file descriptor to keep the underlying file alive 2.
When the loop devices is deconfigured (or the file changed), the driver uses fput() to release it 3.

The only way to prevent the file from being deleted or moved is to set the immutable flag on it (FS_IMMUTABLE_FL). I plan to add this code. If there are any other ideas about this, I will be glad to know them.

Drop the --file feature (user can use losetup if the difference store should be in a file) ? Or, as you said, do as the loop device does.

I don't like the FS_IMMUTABLE_FL too much.

It's a privileged operation
It's a attribute stored on stable storage
It can be manipulated (chattr)
It doesn't work on all file systems
The file can't be unlinked (to automatically go away when last user is gone).

I think, if the kernel module wants to use the file, it has to keep it open.

SergeiShtepa commented 1 year ago

Hi @buczek . Thanks for the interesting thoughts.

I agree that the use of "FS_IMMUTABLE_FL" does not protect against deliberate destructive actions of the user. I have doubts that this is a problem.

Support flag "FS_IMMUTABLE_FL" by not all file systems does not bother me for the reason that fallocate, FSMAP and FIEMAP requests are supported essentially only by EXT4 and XFS (and BTRFS with some features).

The variant to use a loop device will not allow to determine which request to the disk is being made from applications, and which from the blksnap module itself when accessing the difference storage. Now the 'BIO_FILTERED' flag for 'bio' is used for this.

Now a one request is being used to add area to the difference storage, regardless of whether it is a file or just a part of the disk.

It seems to me that a special request 'IOCTL_BLKSNAP_SNAPSHOT_APPEND_FILE', similar to the request 'IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE', can be made. The input parameter for it will be a file descriptor (as for the loop devices). The module will keep it in exclusive access. At the same time, the module will have to request its location on the block device for the file itself by calling FSMAP, since the module will not be able to write data through the file system anyway.

What do you think? Will such a new request solve the problem in full? Don't you think it's too difficult to execute an FSMAP request from the kernel module? I would like to keep the kernel module code simple.

buczek commented 1 year ago

deliberate destructive actions of the user

Or the accidental destructive action by the user or one of the tools the user is using, which also uses the i attribute for its purpose. I just feel it's wrong to use a permanent on-disk flag as a lock for transient system state. For example, if the system goes down because of a power failure, the flag would stick in the file system, although the filter is gone.

The variant to use a loop device will not allow to determine which request to the disk is being made from applications, and which from the blksnap module itself when accessing the difference storage. Now the 'BIO_FILTERED' flag for 'bio' is used for this.

Oh, that's correct and unfortunately, I don't have any idea for another solution.

It seems to me that a special request 'IOCTL_BLKSNAP_SNAPSHOT_APPEND_FILE', similar to the request 'IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE', can be made. The input parameter for it will be a file descriptor (as for the loop devices). The module will keep it in exclusive access. At the same time, the module will have to request its location on the block device for the file itself by calling FSMAP, since the module will not be able to write data through the file system anyway.

What do you think? Will such a new request solve the problem in full? Don't you think it's too difficult to execute an FSMAP request from the kernel module? I would like to keep the kernel module code simple.

I'm not familiar enough with the blksnap code or block I/O in general to give recommendations. All my thoughts might be totally of track, just for you to consider, but feel free to ignore :-)

Generally, I agree, that the code should be kept as simple as possible. So, no, I wouldn't vote for alternatives, unless another approach turn out to be superior (and works in the first place) but the old approach has to be kept as deprecated because of existing users.

SergeiShtepa commented 1 year ago

Thank you so much for your opinion.

I have prepared a v4 patch . I plan to send it soon. it will be possible to discuss this issue in a wider circle.

I will start an issue so as not to forget to try to implement a new ioctl.