miurahr / py7zr

7zip in python3 with ZStandard, PPMd, LZMA2, LZMA1, Delta, BCJ, BZip2, and Deflate compressions, and AES encryption.
https://pypi.org/project/py7zr/
GNU Lesser General Public License v2.1
463 stars 74 forks source link

py7zr creates different archive when writing byte streams to archive #343

Open reimarstier opened 3 years ago

reimarstier commented 3 years ago

Describe the bug When using py7zr to write file structure containing files with zero bytes to archive then 7z on Windows is able to open said file. When using py7zr to write empty byte stream to archive then 7z on Windows is not able to open said file (py7zr is still able to open file though).

test_create_archive__from_dir() works while test_create_archive__that_7z_cannot_extract() raises an error in 7z on Windows. I'd expect these too strategies to produce the same archive. Same result when using archive.writeall() or archive.writef().

To Reproduce https://gist.github.com/reimarstier/8aa6822045dc6b562beea44799f94061

Expected behavior Archive created in both cases should be able to be opened by 7z on windows.

Environment (please complete the following information):

miurahr commented 3 years ago

Test cases: test_archive_empty_file and test_archive_empty_file1 is passed that test a extraction by libarchive and p7zip on windows and linux on v0.16.1.

That is why your 7z extractor may have a compatibility issue.

https://github.com/miurahr/py7zr/actions/runs/820905436

Does py7zr produce a file that breaks libarchive, p7zip and 7z? Could you upload the file?

Could you propose a reproducer as test case?

You can use two helper function that run external 7z command and libarchive python library to extract target file.

p7zip_test(target_path)
libarchive_extract(target_path, extract_path)

ref https://github.com/miurahr/py7zr/blob/master/tests/test_archive.py#L1013-L1050

reimarstier commented 3 years ago

Hey, thanks for your quick response and sorry for taking so long to get back to you. Both of your test cases use the write() method adding files from the file system. Am I wrong to use writef() to write byte streams? This seems to be working fine for most cases but for empty file streams it fails:

def test_create_archive__empty_file_from_stream(tmp_path):
    archive_file = Path(tmp_path).joinpath("archive.7z")
    output_dir = Path(tmp_path).joinpath("output")
    output_dir.mkdir()
    empty_byte_stream = io.BytesIO()

    with py7zr.SevenZipFile(archive_file, 'w') as archive:
        empty_byte_stream.seek(0)
        archive.writef(empty_byte_stream, arcname="empty.txt")

    extract(archive_path=archive_file, output_directory=output_dir)
miurahr commented 3 years ago

py7zr cannot know its zero bytes before reading the stream.

py7zr create directory entry as a file in 7zip archive, read bytes, compress it, and write to 7zip archive, then last put file size to the archive.

7zip command may create zero size file as an only exist on directory entry, and no data entry.

This might make difference.

py7zr can create a empty file with same manner when passing path to archive() function. It check file size and create empty file when size is zero. write() accept stream that may not have a file size(before read), so it treat empty file as a file which has zero data.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days