This allows us to avoid creating all the inner tar files as seperate files on disk and than having to copy them back into the main archive. This will allow us to effectively cut the disk writes in half when creating backups and increase storage media longevity. Additionally this means the user no longer needs twice as much disk space as the backup size since there is only one copy of the backup being written to disk now.
In home-assistant/supervisor#4843 I noticed a large chunk of the time on I/O bound systems is copying the data into the tarfile and than making another tarfile of the original tarfiles.
To avoid the double copy, we now write each tarfile into the fileobj of the outer tar file.
This reduced my backup time on my fast system from 24s to 10s. On I/O bound systems the reduction is multiple minutes.
All new lines are covered
---------- coverage: platform darwin, python 3.12.1-final-0 ----------
Name Stmts Miss Cover Missing
-----------------------------------------------------
securetar/__init__.py 157 6 96% 162, 167-169, 307, 314
-----------------------------------------------------
TOTAL 157 6 96%
Example usage:
outer_secure_tar_file = SecureTarFile(main_tar, "w", gzip=False)
with outer_secure_tar_file as outer_tar_file:
for inner_tgz_file in inner_tgz_files:
with outer_secure_tar_file.create_inner_tar(
inner_tgz_file, gzip=True
) as inner_tar_file:
atomic_contents_add(
inner_tar_file,
temp_orig,
excludes=[],
arcname=".",
)
supports https://github.com/home-assistant/supervisor/pull/4884 and https://github.com/home-assistant/core/pull/110267
This allows us to avoid creating all the inner tar files as seperate files on disk and than having to copy them back into the main archive. This will allow us to effectively cut the disk writes in half when creating backups and increase storage media longevity. Additionally this means the user no longer needs twice as much disk space as the backup size since there is only one copy of the backup being written to disk now.
In home-assistant/supervisor#4843 I noticed a large chunk of the time on I/O bound systems is copying the data into the tarfile and than making another tarfile of the original tarfiles.
To avoid the double copy, we now write each tarfile into the fileobj of the outer tar file.
This reduced my backup time on my fast system from 24s to 10s. On I/O bound systems the reduction is multiple minutes.
All new lines are covered
Example usage: