isi-vista / vistautils

Python utilities developed by USC ISI's VISTA center.
MIT License
3 stars 2 forks source link

Adding files > 2 GB in size to a zip archive fails on macOS #20

Open berquist opened 5 years ago

berquist commented 5 years ago

When using scripts/tar_gz_to_zip.py with Python 3.6, the dreaded OSError: [Errno 22] Invalid argument is raised. This is an open Python issue. The example given,

>>> open('/dev/null', 'wb').write(bytearray(2**31-1))
2147483647

>>> open('/dev/null', 'wb').write(bytearray(2**31))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

works using Python 2.7. However, there are zip-related issues:

Copying data/mp4/IC001G.mp4.zip
Traceback (most recent call last):
  File "/Users/berquist/projects/aida/aida_tools/repos/vistautils/scripts/tar_gz_to_zip.py", line 56, in <module>
    main()
  File "/Users/berquist/projects/aida/aida_tools/repos/vistautils/scripts/tar_gz_to_zip.py", line 39, in main
    out.writestr(member.name, data.read())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 1234, in writestr
    zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
OverflowError: size does not fit in an int

which occurs even with

with ZipFile(output_zip_name, 'w', compression=ZIP_STORED, allowZip64=True) as out:

and is independent of being uncompressed (ZIP_STORED) or compressed (ZIP_DEFLATE).

  1. The Python ticket makes it sound like the "universal" writing issue is on macOS and Windows only, but I haven't tested on Linux yet.
  2. It is unclear if zip-related error will appear with Python 3.x.

    A current (unsatisfactory) workaround is to optionally not add files > 2 GB in size to the zip file. I'll update the issue once I can test it on Linux.

berquist commented 4 years ago

This is fixed in Python 3.7.3 on macOS 10.14.6. It's supposedly backported to 3.6, but it doesn't work with 3.6.7 on the same machine.