lucianopaz / compress_pickle

Standard python pickle, thinly wrapped with standard compression libraries
MIT License
42 stars 12 forks source link

lz4 because speed matters #12

Closed ctmakro closed 4 years ago

ctmakro commented 4 years ago

lz4(compression_level=0) is much(~50%) faster than gzip for compressing then sending pickles over 1Gb Ethernet and SATA2 connections. compressed files are a bit larger thus slower to transmit, but the time spent on compression is greatly reduced.

codecov[bot] commented 4 years ago

Codecov Report

Merging #12 into master will decrease coverage by 2.38%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##            master      #12      +/-   ##
===========================================
- Coverage   100.00%   97.61%   -2.39%     
===========================================
  Files            7        7              
  Lines          493      462      -31     
===========================================
- Hits           493      451      -42     
- Misses           0       11      +11     
Impacted Files Coverage Δ
compress_pickle/compress_pickle.py 93.22% <0.00%> (-6.78%) :arrow_down:
compress_pickle/utils.py 96.09% <0.00%> (-3.91%) :arrow_down:
compress_pickle/tests/test_utils.py 98.01% <0.00%> (-1.99%) :arrow_down:
compress_pickle/tests/fixtures.py 100.00% <0.00%> (ø) :arrow_up:
compress_pickle/tests/test_pickling.py 100.00% <0.00%> (ø) :arrow_up:
lucianopaz commented 4 years ago

@ctmakro, thanks for the PR. However, this doesn't add any tests nor install requirements. lz4 isn't part of python's standard compression packages. Its python bindings are available through pypi, but the lz4 c library to be correctly installed in the system to be able to run.

I would add lz4 as an optional_requires in setup.py, and we should also add tests for it. I think that we should only change a fixture in conftest.py to get lz4 fully tested.

One quick question, you mentioned that you want lz4 to be able to transmit the data fast. Why do you use frame.open. Is there a way to use the streaming protocol they mention in their docs?

ctmakro commented 4 years ago

@ctmakro, thanks for the PR. However, this doesn't add any tests nor install requirements. lz4 isn't part of python's standard compression packages. Its python bindings are available through pypi, but the lz4 c library to be correctly installed in the system to be able to run.

I would add lz4 as an optional_requires in setup.py, and we should also add tests for it. I think that we should only change a fixture in conftest.py, and also add a default filename extension for it, to get lz4 fully tested.

One quick question, you mentioned that you want lz4 to be able to transmit the data fast. Why do you use frame.open. Is there a way to use the streaming protocol they mention in their docs?

I'm working on a project that pickles a lot of data then saves them to disk over GbE regularly. The files are rather large, and some compression would be of great help.

I searched over the internet and your compress_pickle library is the best drop-in solution. I find that gzip is actually just as fast as, sometimes even slower than without compression, if you dump() and load() a lot.

so I added a few lines to make your library work with lz4, which is by far the fastest and least-work solution for my case. So I decided to share it with others, not sure a pull request is the best way since I have no experience in writing tests for programs nor dealing with integration systems...

regarding the lz4 c library: I'm on a windows 7 machine, and all I did was pip install lz4. I don't know you have to install it manually. My bad.

regarding streaming protocol: I will look into that, thanks. It's the first day I heard of the lz4 algorithm actually, and I think I saw in their docs that "frame.open" is the way to go for saving files.

lucianopaz commented 4 years ago

so I added a few lines to make your library work with lz4, which is by far the fastest and least-work solution for my case. So I decided to share it with others, not sure a pull request is the best way since I have no experience in writing tests for programs nor dealing with integration systems...

No worries. @ctmakro. I'm very glad that you found compress_pickle helpful and thanks for wanting to contribute lz4 to it! I can take over using your PR as a basis to work off.

The four tasks that I'll do are:

lucianopaz commented 4 years ago

Closed in favor of #15. @ctmakro, you should be able to install from the git repo for now. Once I update the docs, I'll push a minor release to pypi and then you'll be able to simple pip install -U compress_pickle to use lz4 compression. Thanks again for your contribution!

lucianopaz commented 4 years ago

@ctmakro, version 1.1.1 is now available on pypi, you can simply pip install -U compress_pickle to use lz4

ctmakro commented 4 years ago

@lucianopaz gotcha. nice work!