vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
985 stars 118 forks source link

Zstd compression for sort files #540

Open jodavies opened 1 week ago

jodavies commented 1 week ago

I ran a few tests, using @magv 's suggestion of using https://github.com/facebook/zstd/tree/dev/zlibWrapper to use Zstd while making almost no changes to FORM's code. You need to include the wrapper's header instead of zlib.h, and call a function to enable the use of zstd.

For the test I link against a library as compiled by https://github.com/magv/hepware . We could do this, or alternatively compile zstd in with FORM (by setting it up as a git submodule or by directly including the code in this repository -- it is 3-clause BSD, I think that is allowed).

My first test was a simple program which generates lots of terms in a single module which all cancel in the final sort; no scratch files are created, only sort files. This runs a few percent faster with zstd, and uses a few percent less disk space.

I also ran a mincer test (an N=11 moment of a single diagram for "graviton-exchange DIS" -- this is an old benchmark I have lying around from Andreas). This uses ~12GB of disk (mostly the uncompressed scratch files) but runs 8% faster.

All these tests ran in tmpfs, so disk performance was never a bottleneck. On a slow disk the better compression ratio (say, 5%?) will help as well.

This seems like an easy improvement -- we just need to decide what the best way to include it in the build is. Any thoughts?

tueda commented 1 week ago

Perhaps (lib)zstd(-dev) can be installed easily by the distribution's package manager (or possibly via tools like vcpkg). So, we can handle it similarly to gmp/mpfr/zlib.

For zlibWrapper, we could use a submodule, which would then be copied into the tarball distribution.

Example: https://github.com/tueda/form/tree/feat-zstd-1 (but I don't have a suitable benchmark test for this).

jodavies commented 1 week ago

Yes, that looks good. At least for me, using the #define before the header include to enable zstd doesn't work properly. But we can call ZWRAP_useZSTDcompression(1); (or ZWRAP_useZSTDcompression(0);) in the code somewhere. Using this we can just add a sub-key like On compress, zstd;.

I would say that if compiled with zstd support, use of zstd should be the default. But then one can specify On compress, gzip; for easy benchmarking.

Do you want to implement the On/Off statement also, or I can do it and make a PR into your branch?

tueda commented 1 week ago

Please go ahead. On compress, zstd sounds good. My implementation is experimental, so feel free to use any part of it in your branch if you find it useful.

jodavies commented 1 week ago

Looking through the code for where I need to add things, I noticed that there is https://www.nikhef.nl/~form/maindir/documentation/reference/online/online.html#SECTION008220000000000000000 . Either this needs to also have the zstd option, or we can remove this form FORM 5.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

tueda commented 1 week ago

Either this needs to also have the zstd option, or we can remove this form FORM 5.

The manual doesn't mention Compress gzip, though it works while Compress zstd gives

test.frm Line 1 --> Unknown option: zstd, on, off or gzip expected

Maybe we don't need to touch the Compress statement, which is only for backward compatibility.

If we really want to remove such obsolete commands, perhaps it is good to first introduce deprecated warnings for them.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

Right. I think it's good to keep the 4.3 branch limited to bug fixes (or at most "behaviour changes" for defect resolution), so the additional support for zstd should be excluded.

jodavies commented 1 week ago

Either this needs to also have the zstd option, or we can remove this form FORM 5.

The manual doesn't mention Compress gzip, though it works while Compress zstd gives

test.frm Line 1 --> Unknown option: zstd, on, off or gzip expected

Maybe we don't need to touch the Compress statement, which is only for backward compatibility.

If we really want to remove such obsolete commands, perhaps it is good to first introduce deprecated warnings for them.

Presumably "4.3.2" will not have zstd support, right? That one really should be bug-fixes only on top of 4.3.1.

Right. I think it's good to keep the 4.3 branch limited to bug fixes (or at most "behaviour changes" for defect resolution), so the additional support for zstd should be excluded.

OK, we can add the zstd support to the Compress statement here also. The error message is a little confusing actually: on first glance I wondered how CoCompress already knew about the possibility of a "zstd" option...