jborg / attic

Deduplicating backup program
Other
1.11k stars 104 forks source link

Include other compressors like XZ for smaller size and LZO for speed #114

Open vescudero opened 10 years ago

vescudero commented 10 years ago

Although Attic make a good job deduplicating data, it seems it uses a simle deflate method for compression (like the ones of zip and gzip).

However for data that is going to be backup once for the long term, it makes sense to take a different approach. XZ (Parallel lzma as in zbackup) is in my opinion one of the best compression you can get to further reduce the size at the expense of more cpu and extra time to encrypt and only a little bit more work to restore. For low-end computers or quick short term backups, LZO is the way to go, it's much much faster during compression and decompression at the expense of smaller compression rate.

skarekrow commented 10 years ago

I agree, I think XZ is a great compression format and I +1 this. On 9/16/2014 3:33 PM, Victor Escudero wrote:

Although Attic make a good job deduplicating data, it seems it uses a simle deflate method for compression (like the ones of zip and gzip).

However for data that is going to be backup once for the long term, it makes sense to take a different approach. XZ (lzma) is in my opinion one of the best compression you can get to further reduce the size at the expense of more cpu and extra time to encrypt and only a little bit more work to restore. For low-end computers or quick short term backups, LZO is the way to go, it's much much faster during compression and decompression at the expense of smaller compression rate.

— Reply to this email directly or view it on GitHub https://github.com/jborg/attic/issues/114.

imraro commented 10 years ago

I vote for lz4!

vks commented 9 years ago

Note that LZMA is part of Python since version 3.3.

vks commented 9 years ago

Replacing zlib with lzma yields the following failures:

======================================================================
ERROR: test_keyfile2 (attic.testsuite.key.KeyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "attic/build/lib.macosx-10.9-x86_64-3.4/attic/testsuite/key.py", line 77, in test_keyfile2
    self.assert_equal(key.decrypt(self.keyfile2_id, self.keyfile2_cdata), b'payload')
  File "attic/build/lib.macosx-10.9-x86_64-3.4/attic/key.py", line 128, in decrypt
    data = decompress(self.dec_cipher.decrypt(data[41:]))  # should use memoryview
  File "attic/build/lib.macosx-10.9-x86_64-3.4/attic/key.py", line 19, in decompress
    return lzma.decompress(data)
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lzma.py", line 498, in decompress
    res = decomp.decompress(data)
_lzma.LZMAError: Input format not supported by decoder

======================================================================
FAIL: test_keyfile (attic.testsuite.key.KeyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "attic/build/lib.macosx-10.9-x86_64-3.4/attic/testsuite/key.py", line 62, in test_keyfile
    self.assert_equal(key.extract_nonce(manifest2), 1)
AssertionError: 4 != 1

======================================================================
FAIL: test_passphrase (attic.testsuite.key.KeyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "attic/build/lib.macosx-10.9-x86_64-3.4/attic/testsuite/key.py", line 92, in test_passphrase
    self.assert_equal(key.extract_nonce(manifest2), 1)
AssertionError: 4 != 1

----------------------------------------------------------------------
Ran 112 tests in 35.280s

FAILED (failures=2, errors=1, skipped=8)
ThomasWaldmann commented 9 years ago

I am currently working on implementing support for other compressors than zlib.

Current status:

See PR #207. After this is merged, adding new compression algorithms gets very easy as the infrastructure is already in place.