Closed roigcarlo closed 8 years ago
I decided to go for option 3.
Sounds like a good approach.
Can unit tests be added to this PR that use the filters parameter?
I was trying to see the size difference of files for different compression levels. But the script I wrote (see https://gist.github.com/nathanfranklin/6bac2cb3cb7c56e2c8cd) produces some odd results. It seems like something is not working? Or I am at least confused by the results.
Here are the strange results of writing 10000 particles to a file:
It could be that you must specifically set each table object to compress its data. By default PyTables doesn't compress tables and I'm unsure if setting the filters attribute in the PyTables.open_file method does that either. Maybe it's worth to check...
It could be that you must specifically set each table object to compress its data. By default PyTables doesn't compress tables and I'm unsure if setting the filters attribute in the PyTables.open_file method does that either. Maybe it's worth to check...
I was wondering the same thing. The documentation for filters
at http://www.pytables.org/usersguide/libref/file_class.html though makes it sound like the filter settings should "propogate" down to all the children nodes. Strange.
I also understud that the filter was being propagated. I will make some additional test using your script. It can perfectly be as Tuomas suggest, or maybe its a problem with the shuffle
option, which anyway should automatically set to True...
I have run some tests with different parameters. If not indicated the default values are:
Options:
Particles:
Compression level:
Different number of particles with fixed options
Particles Nº | No compressed | Normal | high-compressed |
---|---|---|---|
10 | 236.926 | 43.964 | 43.299 |
100 | 301.566 | 53.249 | 48.186 |
1000 | 1.206.526 | 108.082 | 117.310 |
10000 | 10.851.214 | 699.851 | 643.411 |
Different compression lib
complib | No compressed | Normal | high-compressed |
---|---|---|---|
zlib | 301.566 | 50.999 | 49.417 |
lzo | 301.566 | 56.093 | 56.093 |
bzip2 | 301.566 | 47.816 | 47.816 |
blosc | 301.566 | 53.718 | 55.730 |
Fletcher and shuffle:
fletcher32 | No compressed | Normal | high-compressed |
---|---|---|---|
True | 301.758 | 51.164 | 48.308 |
False | 301.566 | 50.966 | 48.152 |
shuffle | No compressed | Normal | high-compressed |
---|---|---|---|
True | 301.566 | 51.002 | 48.130 |
False | 301.566 | 54.072 | 49.780 |
I am going to test what is the effect if we change the number of CUBA in each particle, but as you can see the results are quite consistent and basically what I would have expected ( Maybe we should move to bzip2?)
For now I am unable to reproduce @nathanfranklin results. Maybe it would be interesting if someone could also perform the test just to confirm that is either working or not
10000 | 10.851.214 | 699.851 | 643.411
This looks much better than the numbers I had seen. Was this using the script I posted earlier? I am wondering if there was a mistake in the script..or problem with my environment (pytables, hdf5). I am out of the office today but will try to test things out on Monday.
Yep, this is using your script as a base ( with some additional cases ).
I run the test on Archlinux and Fedora 22, not sure if this affects the output in some way. I will try to set up a VM with ubuntu just to make sure
Cool! My results were from SUSE. I don't know what version things I was using (hdf5, tables). I suspect that is where the problem was. Hopefully, it will be clear on Monday what was wrong with my system.
On openSUSE 13.1 (x86_64), I still get the strange numbers that suggest the compression isn't working on my system. I am using HDF5 library version: 1.8.12 and tables version 3.1.1
size of default compressed file: 10852306 bytes
size of not-compressed file: 10851254 bytes
size of high-compressed file: 10851510 bytes
I upgraded to tables-3.2.2 and I got the same results.
size of default compressed file: 10852266 bytes
size of not-compressed file: 10851214 bytes
size of high-compressed file: 10851470 bytes
On our Simphony development system (Ubuntu 12.04 LTS 64 bit) with HDF5 library (1.8.4) and tables (3.2.2). I get correct results like you had seen on your system when using http://www.pytables.org/usersguide/libref/file_class.html:
size of default compressed file: 717509 bytes
size of not-compressed file: 10851214 bytes
size of high-compressed file: 655873 bytes
Could it happen that pytables has been compiled without zlib? According with the doc this is the most common cause for tables not being compressed.
https://www.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf - Sec 3.1.
I do not know which packages have you installed in OpenSUSE 13.1, but reading that my first thought would be that package zlib-devel
was missing at the moment of installing pytables, hence the compression is not working at all.
Could you check that?
I will check that but won't get to it till Tuesday or Wednesday at the earliest. Yeah, sounds like something is just wrong with my hdf5. I will let you know what I find out.
I think we can ignore the problem I am having on my machine for the time being as things seem to work fine on Ubuntu 12.04. :+1:
Ok. If everything else is ok I will solve the conflicts and merge. In any case we can open a different issue if we see that something is missing
:+1:
@carlos, looks like something went wrong when merging master into feature branch and fixing the conflicts. It looks like you possibly merged your feature branch back into itself when fixing the conflicts. The commit have been repeated which is okay but I think the following docstr (https://github.com/simphony/simphony-common/compare/a04d225f25...29e4374d39#diff-e5886385ba19e32cb76fb2d04aae6d83R39 ) needs to be changed back to Returns an opened SimPhoNy CUDS-hdf5 file
Ups, yeah I did some odd things during the merge. I will fix it right now
Could it happen that pytables has been compiled without zlib? According with the doc this is the most common cause for tables not being compressed. https://www.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf - Sec 3.1.
@roigcarlo , that was exactly the problem. I had built hdf5 on my work machine without support for gzip/szip :) mystery solved. thanks.
This implements #215
I was considering some options to implement this: 1 - Force the compression 2 - Allow the use of compression but not enabling it by default 3 - Allow the use of compression and enabling it by default.
1) seems just too restrictive 2) seems just to complicated for the user, so I decided to go for option 3.
As for the compression lib I selected zlib by default. If I remember correctly is already a dependency for HDF5 package so it should not be problematic.