iLCSoft / SIO

Simple IO package
BSD 3-Clause "New" or "Revised" License
0 stars 9 forks source link

Feature request: Different compression libraries #14

Open tmadlener opened 3 years ago

tmadlener commented 3 years ago

How hard would it be to support another compression library apart from zlib? Respectively, how much of the interface would actually be affected if support for "arbitrary" compression libraries was the goal?

The reason I am asking is because I have run some benchmarks using podio and EDM4hep and compared the default root backend with one based on sio. Specifically in the case of root, it seems that explicitly requesting zlib to compress leads to significantly worse write times, compared to the default that uses LZ4. If you look here, you will see that with the default compression, writing is about 60 % faster than using zlib. Reading seems to be largely unaffected by this.

So potentially similar gains would be possible if the compression algorithm could be "replaced" in sio?

rete commented 3 years ago

Hi @tmadlener . I came recently to the same idea. Different documentation for me: https://diana-hep.org/pages/project_root_io_compression.html.html.

I wanted to make this study by indeed changing the SIO API for compression and implement at least the LZ4 algorithm additionally. I plan to do that at the beginning of next year.

The goal on my side is to reduce the data size at the maximum and keep at least the same reading/writing time or get even better. We have been through a lot of issues with storage in the recent MC production. This algorithm could be a game change for LCIO in the future.