pvizeli / securetar

Secure Tarfile library
Apache License 2.0
6 stars 2 forks source link

Add support for creating inner tar files #33

Closed bdraco closed 9 months ago

bdraco commented 9 months ago

supports https://github.com/home-assistant/supervisor/pull/4884 and https://github.com/home-assistant/core/pull/110267

This allows us to avoid creating all the inner tar files as seperate files on disk and than having to copy them back into the main archive. This will allow us to effectively cut the disk writes in half when creating backups and increase storage media longevity. Additionally this means the user no longer needs twice as much disk space as the backup size since there is only one copy of the backup being written to disk now.

In home-assistant/supervisor#4843 I noticed a large chunk of the time on I/O bound systems is copying the data into the tarfile and than making another tarfile of the original tarfiles.

To avoid the double copy, we now write each tarfile into the fileobj of the outer tar file.

This reduced my backup time on my fast system from 24s to 10s. On I/O bound systems the reduction is multiple minutes.

All new lines are covered

---------- coverage: platform darwin, python 3.12.1-final-0 ----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
securetar/__init__.py     157      6    96%   162, 167-169, 307, 314
-----------------------------------------------------
TOTAL                     157      6    96%

Example usage:

    outer_secure_tar_file = SecureTarFile(main_tar, "w", gzip=False)
    with outer_secure_tar_file as outer_tar_file:
        for inner_tgz_file in inner_tgz_files:
            with outer_secure_tar_file.create_inner_tar(
                inner_tgz_file, gzip=True
            ) as inner_tar_file:
                atomic_contents_add(
                    inner_tar_file,
                    temp_orig,
                    excludes=[],
                    arcname=".",
                )
bdraco commented 9 months ago

I should also do core as well since it might need something more. But I think this will be fine as is

bdraco commented 9 months ago

Working great with core and supervisor.

I think this is ready for review now