sstadick / crabz

Like pigz, but rust
The Unlicense
334 stars 13 forks source link

gzip should be handled by ParDecompress #13

Closed derolf closed 2 years ago

derolf commented 2 years ago

When doing parallel compression of a gzip file, crabz uses ZBuilder which instantiates a ParCompress when num_threads > 1.

However, decompression always uses single-threaded MultiGzDecoder. Why is it not using ParDecompress when num_threads > 1?

sstadick commented 2 years ago

The gzip format itself isn't able to make use of multiple threads. For parallel decompression use Mgzip or Bgzf formats which are block compression formats and can take advantage of multithreading for decompression.

derolf commented 2 years ago

The gzip format itself isn't able to make use of multiple threads. For parallel decompression use Mgzip or Bgzf formats which are block compression formats and can take advantage of multithreading for decompression.

But Crabz does create an mgz with num_threads > 1?

derolf commented 2 years ago

What I am trying to say is that crabz uses ZBuilder for the gzip compression, and ZBuilder creates an mgz if num_threads > 1 (using ParCompress).

https://github.com/sstadick/gzp/blob/4bba36567d19a74aa4b7f13b932c7c28f96fb812/src/lib.rs#L241

sstadick commented 2 years ago

Correct. Normal Gzip is asymetrical. Multiple threads can be used to gzip compress a file, but regular gzip files can only be decompressed single-threaded.

derolf commented 2 years ago

Correct. Normal Gzip is asymetrical. Multiple threads can be used to gzip compress a file, but regular gzip files can only be decompressed single-threaded.

So I should be able to use ParDecompress to decompress it? (I am having some huge XML files I compress with parallel crabz and want to decompress them programmatically)

sstadick commented 2 years ago

Ah, I'd recommend compressing them with crabz -f bgzf and then using gzp as follows:

            if num_threads == 0 {
                let mut reader = BgzfSyncReader::new(input);
                io::copy(&mut reader, &mut output)?;
                output.flush()?;
            } else {
                let mut reader = ParDecompressBuilder::<Bgzf>::new()
                    .num_threads(num_threads) // 4 threads per file is about where decompression maxes out. Anything more is not helping
                    .unwrap()
                    .pin_threads(pin_at) // pinning is very optional, could ignore it.
                    .from_reader(input);
                io::copy(&mut reader, &mut output)?;
                output.flush()?;
                reader.finish()?;
            };

From here: https://github.com/sstadick/crabz/blob/cce55c89df2d613c522ad32d0e76a3bbe3e47f12/src/main.rs#L443

I don't have as nice of an abstraction over decompression at this time. It's been on the todo list though!

Good questions, this does expose some weaknesses in the documentation.

derolf commented 2 years ago

Hm, I also tried the bgzf compression, but it was way slower than gzip.

What’s wrong with using gzip and ParCompress/ParDecompress?

sstadick commented 2 years ago

That is extremely odd that BGZF compressions would be slower than gzip compression. Can you share the CLI invocation of crabz for both formats?

Nothing is wrong with using gzip with ParCompress, there is just no ParDecompress available for Gzip.

derolf commented 2 years ago

Some timings (1.74 GiB XML):

$ pv -N IN -c na.osm.xml | crabz -f gzip -l 9 | pv -N OUT -c > /dev/null
[2022-02-16T09:02:57Z INFO  crabz] Compressing (gzip) with 8 threads at compression level 9.
       IN: 1.74GiB 0:00:17 [ 104MiB/s] [=============================================>] 100%            
      OUT:  216MiB 0:00:17 [12.7MiB/s] [                     <=>                                       ]
$ pv -N IN -c na.osm.xml | crabz -f mgzip -l 12 | pv -N OUT -c > /dev/null
[2022-02-16T09:03:48Z INFO  crabz] Compressing (mgzip) with 8 threads at compression level 12.
       IN: 1.74GiB 0:01:32 [19.2MiB/s] [=============================================>] 100%            
      OUT:  187MiB 0:01:32 [2.02MiB/s] [        <=>                                                    ]
$
$ pv -N IN -c na.osm.xml | crabz -f bgzf -l 12 | pv -N OUT -c > /dev/null
[2022-02-16T09:06:19Z INFO  crabz] Compressing (bgzf) with 8 threads at compression level 12.
       IN: 1.74GiB 0:01:30 [19.6MiB/s] [=============================================>] 100%            
      OUT:  191MiB 0:01:30 [2.11MiB/s] [          <=>                                                  ]
$

You see that gzip is 5x faster than the others.

derolf commented 2 years ago

I played around a bit with your barebone gzp examples and created my own little CLI just for mgzip.

Actually, the performance varies a lot with the compression level. I get good values with:

use gzp::{
    deflate::Mgzip,
    par::compress::{ParCompress, ParCompressBuilder},
    Compression, ZWriter,
};
use std::io::{Read, Write};

type FORMAT = Mgzip;
const LEVEL: u32 = 10;
const THREADS: usize = 16;
const BUFSIZE: usize = 1024 * 1024;

fn main() {
    let chunksize = BUFSIZE * 2;

    let stdout = std::io::stdout();
    let mut writer: ParCompress<FORMAT> = ParCompressBuilder::new()
        .buffer_size(BUFSIZE)
        .unwrap()
        .compression_level(Compression::new(LEVEL))
        .num_threads(THREADS)
        .unwrap()
        .from_writer(stdout);

    let stdin = std::io::stdin();
    let mut stdin = stdin.lock();

    let mut buffer = Vec::with_capacity(chunksize);
    loop {
        let mut limit = (&mut stdin).take(chunksize as u64);
        limit.read_to_end(&mut buffer).unwrap();
        if buffer.is_empty() {
            break;
        }
        writer.write_all(&buffer).unwrap();
        buffer.clear();
    }
    writer.finish().unwrap();
}

Decompressor:

use gzp::{
    deflate::Mgzip,
    par::decompress::{ParDecompress, ParDecompressBuilder},
};
use std::io::{Read, Write};

type FORMAT = Mgzip;
const THREADS: usize = 16;

fn main() {
    let chunksize = 1 * 1024 * 1024;

    let stdin = std::io::stdin();

    let mut reader: ParDecompress<FORMAT> = ParDecompressBuilder::new()
        .num_threads(THREADS)
        .unwrap()
        .from_reader(stdin);

    let stdout = std::io::stdout();
    let mut stdout = stdout.lock();

    let mut buffer = Vec::with_capacity(chunksize);
    loop {
        let mut limit = (&mut reader).take(chunksize as u64);
        limit.read_to_end(&mut buffer).unwrap();
        if buffer.is_empty() {
            break;
        }
        stdout.write_all(&buffer).unwrap();
        buffer.clear();
    }
}
derolf commented 2 years ago

So, final conclusion:

Now, XML-parsing is the bottleneck :-)

sstadick commented 2 years ago

Nice! Those are solid results! It's worth noting that compression level 12 for BGZF / Mgzip != level 9 for gzip. It's actually more compressed (or it should be depending on the input). If you ran the same commands with level 8 or 9 for the block formats the times should even out.

The block compressors use libdeflate, which has the following docs on compression levels: https://github.com/ebiggers/libdeflate#compression-levels.

Also, anything more than ~4 threads for decompression doesn't seem to help in my benchmarking, and possibly slows things down a bit.

derolf commented 2 years ago

Thanks a lot and keep up the good work!