marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository
https://marian-nmt.github.io
Other
256 stars 126 forks source link

zstandard support in input files #993

Open bhearsum opened 1 year ago

bhearsum commented 1 year ago

Feature description

zstandard is a modern compression format that is optimized for size and decompression speed. In both these metrics it outperforms gzip/zlib (at the cost of slower compression times). When working with large models is a good candidate as an intermediate format due to these qualities.

It would be wonderful if marian could support natively decompressing these archives as it already does for gzip.

Citations on zstd vs gzip: