openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

Ability to run/trigger compression/deduplication of pool/volume manually #3013

Open pavel-odintsov opened 9 years ago

pavel-odintsov commented 9 years ago

Hello!

I have big amount of non compressed data in multiple pools and volumes. I want to enable compression because my data compressed very well in synthetic tests.

I enabled compression for pool:

zfs set compression=lz4 data

But I can't find any way to compress data on pool without copying it again.

I do following:

for i in `/bin/ls /data`;do
   echo "Process volume ${i}";
   zfs snapshot data/${i}@snap;
   zfs send data/${i}@snap | zfs receive -F data/${i}_compressed;done

It works well and compression going perfectly.

But how I can do compression in place without service interruption and creating temporary volumes?

I review zio.c code and found code used for compression is not hard to understand. What problems with in-place data compression or decompression?

This ticket can be related with https://github.com/zfsonlinux/zfs/issues/1071 but deduplication logic is very different in compare with compression.

behlendorf commented 9 years ago

But I can't find any way to compress data on pool without copying it again.

Right, at the moment doing this transparently isn't supported. You're either going to need to do what you're doing with send/recv to a temporary volume which gets renamed. Or you could write a script to do this on a per-file basis for a dataset. If compression is enabled for the dataset new files will be compressed so you would just need to do something like this cp file file.tmp; unlink file; mv file.tmp tmp. Keep in mind if a dataset has snapshots the uncompressed blocks will remain part of the snapshot until it is also removed.

Doing this transparently in the background is technically possible but the same caveats regarding snapshot apply. They are immutable, period. Obviously someone would still need to write the code for this.

pavel-odintsov commented 9 years ago

Thank you very much!

I wrote simple Perl script for this task https://gist.github.com/pavel-odintsov/aa497b6d9b351e7b3e2b and it works well.

pavel-odintsov commented 9 years ago

Unfortunately file-to-file iteration for my data is extremely slow. I run file_rewrite.pl about 36+ hours ago and now about 6% of data was processed.

Processing of files is still not reliable way because files with broken names (due to encoding issues; not related with ZFS) did not processed correctly.

Can I do same on block level in-place? I want to get all used blocks of my volume and do compression for they blocks instead relaying on files.

behlendorf commented 9 years ago

Can I do same on block level in-place?

No. You could send/recv for the pool with incremental snapshots. That would allow you to keep the downtown to a minimum.

pavel-odintsov commented 9 years ago

This issue is even more important in case of ZVOL when we can't touch every file in filesystem (ntfs, refs and another non linux fs).

paboldin commented 8 years ago

@behlendorf is it required to recreate the file or is it enough just to re-write the blocks? Can this rewriting be done at the VFS level?

As I can see from the source code it should be enough. In this case one can implement 'toucher' using e.g. dsl_sync_task and dmu_traverse (?). Is that correct?

behlendorf commented 8 years ago

@paboldin simply re-dirtying the block is enough given two caviots.

1) The new bp and original bp must have different characteristics, in this case checksum algorithm or dedup. Otherwise the write will be optimized out by zio_nop_write().

2) This could easily result in a doubling of space used if the filesystem/zvol has snapshots. Those block can never be rewritten. It would probably be wise to include a sanity check on the required free space before allowing such an operation.

rlaager commented 8 years ago

See also #2554.

dioni21 commented 6 years ago

The very old problem of BP rewrite. AFAIR, everybody that try abort saying it is too difficult. :-(

ghost commented 5 years ago

I wrote a small shell script to replicate, verify and overwrite all files in the current working directory and all its descendant directories in order to trigger ZFS compression. Use with significant caution and make sure to have a backup beforehand.

owlshrimp commented 3 years ago

@paboldin simply re-dirtying the block is enough given two caviots.

  1. The new bp and original bp must have different characteristics, in this case checksum algorithm or dedup. Otherwise the write will be optimized out by zio_nop_write().
  2. This could easily result in a doubling of space used if the filesystem/zvol has snapshots. Those block can never be rewritten. It would probably be wise to include a sanity check on the required free space before allowing such an operation.

So, if for example we enabled deduplication and compression at the same time, or enabled compression and changed checksum algorithm, then dirtied all the blocks, it would result in them all being rewritten? (I presume a combination of deduplication and changed checksum would also work?)

What would be the best way to re-dirty a block, given a hypothetical outer loop that cycles over every block of every file? Can it be done without changing the block's contents? (is this what the above conditions ensure?) Is this something that really should be done from within ZFS itself? From the accompanying library?

Baseless speculation:

Part of me wonders if it's possible to introduce a sequence number* in the block pointers just to make data appear "different" to zio_nop_write() without altering the settings. Then it's a matter of going through the directory tree and progressively dirtying every block of every file, so long as there's space** (and maybe I/O capacity) available to accommodate it.

*a "please rewrite" flag would have to be set on everything, though perhaps that traversal wouldn't be so bad. Also maybe not, if you consider a flag to be a 2-value sequence number. Hmm.

**might be enough to say to ZFS "please leave at least 200 GB" though one would expect the space to be reclaimed if there are no snapshots pinning it

owlshrimp commented 3 years ago

This is starting to remind me a little of the issue thread for radz expansion ( #12225 ). There were similar requests for a way to trigger the reformatting of old data to the new stripe width, though it may or may not be more tricky there.