ondisk/l2arc/arc compression

maci0 commented 9 years ago

@behlendorf Hello, I'm pretty sure this is not the right way to ask, but im doing it anyway :-p

Out of curiousity I am looking for some clarification. Recently I have seen that stuff like L2ARC, Metdata, compression etc etc.

So I was wondering.. If data is already compressed on disk. Does zfs then just copy the compressed data into L2ARC or ARC and decompress on next access. Or does it actually decompress the on-disk data and then on the fly compress it again into L2(ARC) ... Or am I getting it all wrong and it doesnt support compressed ARC yet?

prakashsurya commented 9 years ago

Currently, the ARC contains only the uncompressed data, so the data is compressed on the fly as it is written out. In the case of writing to the pool, the data is compressed in the ZIO layer, and l2arc_compress_buf() handles the compression when writing out to L2ARC (via l2arc_write_buffers()).

behlendorf commented 9 years ago

Right. What @prakashsurya said.

maci0 commented 9 years ago

So my understanding is that the current situation is not quite optimal. IMO it would make much more sense to read compressed blocks from disk and store them in their compressed form into arc/l2arc and then uncompress them on-the-fly. This will not only reduce the needed amount of RAM and space on the l2arc device, but also reduce total reads from those caches. In addition to that it also needs less cpu cycles. Because you dont decompress while reading from disk just to compress again when copying those blocks to the l2arc device.

@behlendorf might be something worth considering.

Sachiru commented 9 years ago

Umm... a few things about that.

Data in L2ARC is populated by a process that lazily reads from the ghost LRU/LFU pages, thus you cannot "read" directly from compressed blocks into L2ARC by definition, since it has to come from ARC first. Additionally, it will not reduce RAM as much as you think it will because data fetched from L2ARC is put back into ARC when it is read. Think of L2ARC as a sort of overflow buffer for ARC, that gets put back into ARC when accessed. L2ARC is not speculative caching, i.e. it will not cache things that are currently untouched and unused that it thinks may be accessed, but it will cache things that have already been accessed, touched and used and thus have a chance of being accessed again

Finally, although it consumes less CPU cycles, ironically it is much slower to read compressed from disk and directly copy that to L2ARC, rather than read from uncompressed RAM, compress it and evict it to L2ARC. Accessing RAM is a degree of magnitude faster than accessing L2ARC, which in turn is a magnitude faster than accessing disk. Reading from the disk array is a very slow operation, not to mention having to reconstruct the data from several RAIDZ/Z2 stripes into a format that can be stored on L2ARC.

Remember that the reason for ARC is because accessing disks/SSDs is slow, compared to RAM. It thus seems counterintuitive to me to design your cache so that your second slowest cache tier (L2ARC) fetches data from your slowest storage tier (disks) by making it run back and forth from your fastest tier(ARC), to somehow make less use of CPU resources. Would it not make more sense to just evict the data from the fastest tier to the lower tiers in the first place so that you don't need to touch your slowest storage at all?

prakashsurya commented 9 years ago

@maci0 If I'm understanding you correctly, you're suggesting the ARC hold the compressed buffer instead of the uncompressed buffer?

There's been talk about modifying the ARC to do that; e.g. use the compressed form of the data that is being stored on disk. This means that we would no longer decompress when reading a block from disk, as we'd wait to do this decompression until serving the data to whatever consumer needed it. We'd also free any dirty buffer once it was written out to disk, and update the ARC buffer to use the compressed version that was written out.

The benefit of this might not be as straight forward to calculate if we soon move to allocating buffers in page size chunks, though. The CPU used to do the compression might not save as much memory if we always have to round up to 4K pages (e.g. an 8K buffer compressed to 4.5K will still consume 8K of RAM).

maci0 commented 9 years ago

@prakashsurya So even if it still consumes 4.5K of RAM and we don't save any memory. We would need fewer I/O on the memory and memory bus, because we only read out 4.5K instead of 8K.

openzfs / zfs

ondisk/l2arc/arc compression #3022