vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.12k stars 135 forks source link

Merge bzip feature? #210

Open benruijl opened 7 years ago

vermaseren commented 7 years ago

Did we try this out on a few nice diagrams?

Jos

On 29 jun. 2017, at 15:50, Ben Ruijl notifications@github.com wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/210, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEsLx9eONaytWn-0xTif-Z5e-FHlyks5sI6uvgaJpZM4OJVWB.

tueda commented 7 years ago

Where do we have the code? On some branch?

vermaseren commented 7 years ago

I think Ali put it in some branch a few years ago.

Jos

On 29 jun. 2017, at 16:09, Takahiro Ueda notifications@github.com wrote:

Where do we have the code? On some branch?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/210#issuecomment-311977883, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEgsOeESuBvTYWo9-rMxWApYgbq22ks5sI7AJgaJpZM4OJVWB.

benruijl commented 7 years ago

It's in the compression branch.

jodavies commented 7 years ago

It could be interesting to investigate something like https://quixdb.github.io/squash/ , then one can try any of the "modern" compression libraries with no/minimal changes to the code. Things like the lower ratio but faster Snappy look good.

jodavies commented 7 years ago

For the record, I took some uncompressed xformxxx.sFN files (from the mincer test in speedtest). I compress (on the command line) 8 of them, totalling 5574MB, one after another. I get the following:

method     compress   ratio    decompress
zlib -1    0m55s      4.51x    0m29s
zlib -6    2m6s       5.94x    0m26s
bzip2 -1   15m57s     6.7x     3m31s
bzip2 -6   19m2s      8.37x    3m45s
snappy     0m15s      3.43x    0m8s
density -1 0m15s      1.78x    0m15s
density -7 0m23s      2.42x    0m21s
density -9 0m50s      3.44x    1m1s
lzo1b -1   0m35s      3.54x    0m16s
lzo1b -6   0m46s      4.0x     0m14s

bzip2 gives very good ratios, but I can't imagine it is something one would want to enable on scratch files that are 100s of GB... snappy in particular, looks very promising here. The ratio is not as good as zlib -1, but it is very fast. With the size of today's HDDs, this would be a nice option. Maybe one day I will have a look at adding it. The compression branch is helpful as it shows where one has to alter the code.

vermaseren commented 7 years ago

When Ali programmed this he claimed that on his tests it was not slower than gzip. We still have to see this with realistic diagram calculations.

Jos

On 3 jul. 2017, at 16:58, Josh Davies notifications@github.com wrote:

For the record, I took some uncompressed xformxxx.sFN files (from the mincer test in speedtest). I compress (on the command line) 8 of them, totalling 5574MB, one after another. I get the following:

method compress ratio decompress zlib -1 0m55s 4.51x 0m29s zlib -6 2m6s 5.94x 0m26s bzip2 -1 15m57s 6.7x 3m31s bzip2 -6 19m2s 8.37x 3m45s snappy 0m15s 3.43x 0m8s density -1 0m15s 1.78x 0m15s density -7 0m23s 2.42x 0m21s density -9 0m50s 3.44x 1m1s lzo1b -1 0m35s 3.54x 0m16s lzo1b -6 0m46s 4.0x 0m14s bzip2 gives very good ratios, but I can't imagine it is something one would want to enable on scratch files that are 100s of GB... snappy in particular, looks very promising here. The ratio is not as good as zlib -1, but it is very fast. With the size of today's HDDs, this would be a nice option. Maybe one day I will have a look at adding it. The compression branch is helpful as it shows where one has to alter the code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/210#issuecomment-312667637, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEo4QyUbld7hWKc7pHZMIA4E3vVh5ks5sKQGPgaJpZM4OJVWB.

benruijl commented 7 years ago

I say we postpone this feature a bit, perhaps for 4.2.1.

vermaseren commented 7 years ago

If we do not have our own tests with timings on realistic diagrams we should postpone it. This solves at the same time the list of authors, because that means that only three people made contributions to this version.

Jos

On 5 jul. 2017, at 21:14, Ben Ruijl notifications@github.com wrote:

I say we postpone this feature a bit, perhaps for 4.2.1.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/210#issuecomment-313199121, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEvKWhIwKaPa0L3Dd8CnZSGDjV1xxks5sK-CIgaJpZM4OJVWB.