markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

problems with active gocryptfs (fuse) mounts #250

Open brainchild0 opened 3 years ago

brainchild0 commented 3 years ago

I recently ran the application, with rather standard options (-dhr), on a system that had active gocryptfs mounts. After the operation, thought it succeeded without error, the files appeared corrupted through the mounted view. Fortunately, the corruption seemed not to affect the physical, encrypted files, and removing and then recreating the mount appeared to resolve the problem.

Nevertheless, I would understand that in principle the application would not affect any active system operation or state above the underlying Btrfs system.

Note that gocryptfs is build GO-FUSE, a FUSE library.

Does this experience reflect any known problems or similar observations about conflicts between de-duplication operations and FS-over-FS mounts?

lorddoskias commented 3 years ago

Can you elaborate more on the storage stack. You have a btrfs filesystem on physical hardware, then this gocrypft is somehow overlaid on top of it? I.e when you write to a gocryptfs it first does some transformation (encrypting data) and then it sends the files to the underlying filesystem (btrfs)?

brainchild0 commented 3 years ago

I.e when you write to a gocryptfs it first does some transformation (encrypting data) and then it sends the files to the underlying filesystem (btrfs)?

Yes, exactly. A gocryptfs store is an ordinary directory tree in any file system. The files in this tree are encrypted. To use the files, the tree location is mounted to some mount point, with the password given to the mount operation. The mount point exposes the plaintext view of the files. The physical encrypted files and plaintext virtual files correspond one-to-one (though the encrypted store does hold additional metadata files).

lorddoskias commented 3 years ago

It's not clear how you have run the deduplication i.e on the mounted folder or on the underlying filesystem. In my testing it seems it's not possible to run the extent dedup (default in current HEAD) since gocryptfs doesn't support fiemap and the output you'd get is:

Skipping file due to error 95 from function csum_by_extent (Operation not supported), /root/mnt/test1
Skipping file due to error 95 from function csum_by_extent (Operation not supported), /root/mnt/test2

OTOH if I run duperemove on the underlying filesystem containing the encrypted files then it likely won't find dedups since even if a file has the exact same content it would result in different data stored on-disk due to the design of gocryptfs, namely storing aes-gcm IVs and the ghash.

brainchild0 commented 3 years ago

OTOH if I run duperemove on the underlying filesystem containing the encrypted files then it likely won't find dedups since even if a file has the exact same content it would result in different data stored on-disk due to the design of gocryptfs, namely storing aes-gcm IVs and the ghash.

It may not be helpful, but should not be harmful. If the volume has an encrypted store that happens to be mounted, then the mount should work during and after any operation.

Can you imagine any way that any of the system calls from the application somehow conflict with FUSE (perhaps due to a bug or design flaw in that system)? I would assume that the FUSE system and not the encryption is responsible for any problems of this kind.

lorddoskias commented 3 years ago

Nope. Also you didn't provide vital information - i.e whether the files exhibiting corruption were actually deduplicated. Also the corruption not affecting the physical files sounds really strange. The files are either corrupted or they aren't and just doing a remount "fixing" it also sounds extremely strange. It's not obvious what might have gone wrong, if you can reproduce then I can take a look but for there isn't anything to be done.

brainchild0 commented 3 years ago

I understand. I will try to reproduce.