simphony / simphony-common

The native implementation of the Simphony cuds objects
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

Fix use compression #231

Closed roigcarlo closed 8 years ago

roigcarlo commented 8 years ago

This implements #215

I was considering some options to implement this: 1 - Force the compression 2 - Allow the use of compression but not enabling it by default 3 - Allow the use of compression and enabling it by default.

1) seems just too restrictive 2) seems just to complicated for the user, so I decided to go for option 3.

As for the compression lib I selected zlib by default. If I remember correctly is already a dependency for HDF5 package so it should not be problematic.

nathanfranklin commented 8 years ago

I decided to go for option 3.

Sounds like a good approach.


Can unit tests be added to this PR that use the filters parameter?

nathanfranklin commented 8 years ago

I was trying to see the size difference of files for different compression levels. But the script I wrote (see https://gist.github.com/nathanfranklin/6bac2cb3cb7c56e2c8cd) produces some odd results. It seems like something is not working? Or I am at least confused by the results.

Here are the strange results of writing 10000 particles to a file:

tuopuu commented 8 years ago

It could be that you must specifically set each table object to compress its data. By default PyTables doesn't compress tables and I'm unsure if setting the filters attribute in the PyTables.open_file method does that either. Maybe it's worth to check...

nathanfranklin commented 8 years ago

It could be that you must specifically set each table object to compress its data. By default PyTables doesn't compress tables and I'm unsure if setting the filters attribute in the PyTables.open_file method does that either. Maybe it's worth to check...

I was wondering the same thing. The documentation for filters at http://www.pytables.org/usersguide/libref/file_class.html though makes it sound like the filter settings should "propogate" down to all the children nodes. Strange.

roigcarlo commented 8 years ago

I also understud that the filter was being propagated. I will make some additional test using your script. It can perfectly be as Tuomas suggest, or maybe its a problem with the shuffle option, which anyway should automatically set to True...

roigcarlo commented 8 years ago

I have run some tests with different parameters. If not indicated the default values are:

Options:

Particles:

Compression level:

Different number of particles with fixed options

Particles Nº No compressed Normal high-compressed
10 236.926 43.964 43.299
100 301.566 53.249 48.186
1000 1.206.526 108.082 117.310
10000 10.851.214 699.851 643.411

Different compression lib

complib No compressed Normal high-compressed
zlib 301.566 50.999 49.417
lzo 301.566 56.093 56.093
bzip2 301.566 47.816 47.816
blosc 301.566 53.718 55.730

Fletcher and shuffle:

fletcher32 No compressed Normal high-compressed
True 301.758 51.164 48.308
False 301.566 50.966 48.152
shuffle No compressed Normal high-compressed
True 301.566 51.002 48.130
False 301.566 54.072 49.780

I am going to test what is the effect if we change the number of CUBA in each particle, but as you can see the results are quite consistent and basically what I would have expected ( Maybe we should move to bzip2?)

For now I am unable to reproduce @nathanfranklin results. Maybe it would be interesting if someone could also perform the test just to confirm that is either working or not

nathanfranklin commented 8 years ago

10000 | 10.851.214 | 699.851 | 643.411

This looks much better than the numbers I had seen. Was this using the script I posted earlier? I am wondering if there was a mistake in the script..or problem with my environment (pytables, hdf5). I am out of the office today but will try to test things out on Monday.

roigcarlo commented 8 years ago

Yep, this is using your script as a base ( with some additional cases ).

I run the test on Archlinux and Fedora 22, not sure if this affects the output in some way. I will try to set up a VM with ubuntu just to make sure

nathanfranklin commented 8 years ago

Cool! My results were from SUSE. I don't know what version things I was using (hdf5, tables). I suspect that is where the problem was. Hopefully, it will be clear on Monday what was wrong with my system.

nathanfranklin commented 8 years ago

On openSUSE 13.1 (x86_64), I still get the strange numbers that suggest the compression isn't working on my system. I am using HDF5 library version: 1.8.12 and tables version 3.1.1

size of default compressed file: 10852306 bytes 
size of not-compressed file: 10851254 bytes
size of high-compressed file: 10851510 bytes 

I upgraded to tables-3.2.2 and I got the same results.

size of default compressed file: 10852266 bytes 
size of not-compressed file: 10851214 bytes
size of high-compressed file: 10851470 bytes 
nathanfranklin commented 8 years ago

On our Simphony development system (Ubuntu 12.04 LTS 64 bit) with HDF5 library (1.8.4) and tables (3.2.2). I get correct results like you had seen on your system when using http://www.pytables.org/usersguide/libref/file_class.html:

size of default compressed file: 717509 bytes 
size of not-compressed file: 10851214 bytes
size of high-compressed file: 655873 bytes 
roigcarlo commented 8 years ago

Could it happen that pytables has been compiled without zlib? According with the doc this is the most common cause for tables not being compressed.

https://www.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf - Sec 3.1.

I do not know which packages have you installed in OpenSUSE 13.1, but reading that my first thought would be that package zlib-devel was missing at the moment of installing pytables, hence the compression is not working at all.

Could you check that?

nathanfranklin commented 8 years ago

I will check that but won't get to it till Tuesday or Wednesday at the earliest. Yeah, sounds like something is just wrong with my hdf5. I will let you know what I find out.

I think we can ignore the problem I am having on my machine for the time being as things seem to work fine on Ubuntu 12.04. :+1:

roigcarlo commented 8 years ago

Ok. If everything else is ok I will solve the conflicts and merge. In any case we can open a different issue if we see that something is missing

nathanfranklin commented 8 years ago

:+1:

nathanfranklin commented 8 years ago

@carlos, looks like something went wrong when merging master into feature branch and fixing the conflicts. It looks like you possibly merged your feature branch back into itself when fixing the conflicts. The commit have been repeated which is okay but I think the following docstr (https://github.com/simphony/simphony-common/compare/a04d225f25...29e4374d39#diff-e5886385ba19e32cb76fb2d04aae6d83R39 ) needs to be changed back to Returns an opened SimPhoNy CUDS-hdf5 file

roigcarlo commented 8 years ago

Ups, yeah I did some odd things during the merge. I will fix it right now

nathanfranklin commented 8 years ago

Could it happen that pytables has been compiled without zlib? According with the doc this is the most common cause for tables not being compressed. https://www.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf - Sec 3.1.

@roigcarlo , that was exactly the problem. I had built hdf5 on my work machine without support for gzip/szip :) mystery solved. thanks.