Closed jmcnamara closed 6 months ago
Based on the analysis in https://github.com/zip-rs/zip-old/issues/388 the issue is that the DEFAULT_VERSION
is 46 for bzip even if bzip isn't being used.
My proposed workaround was:
diff --git a/src/types.rs b/src/types.rs
index c3d0a45..e2f44d4 100644
--- a/src/types.rs
+++ b/src/types.rs
@@ -272,7 +272,10 @@ impl TryFrom<OffsetDateTime> for DateTime {
}
}
+#[cfg(feature = "bzip2")]
pub const DEFAULT_VERSION: u8 = 46;
+#[cfg(not(feature = "bzip2"))]
+pub const DEFAULT_VERSION: u8 = 45;
/// A type like `AtomicU64` except it implements `Clone` and has predefined
/// ordering.
I can submit a PR for that if you think it is reasonable.
Yes please; I'll have to look up what other software DEFAULT_VERSION affects. Thanks for migrating this issue and closing the original; it helps get the old repo ready for archiving.
Ok, looks like we may need different version numbers according to the compression and encryption we're actually using for each file, not just what feature flags we're built with. See APPNOTE 4.4.3.2. For example, we'll need to set 51 for AES and 63 for LZMA.
I've implemented this myself, with all the features for which APPNOTE 4.4.3.2 specifies a minimum version. (Not sure what the correct version is for Zstandard, because APPNOTE doesn't say.)
Thanks for that. I'll rerun the test case for this later and let you know.
Apologies, it took me a while to test this and unfortunately it doesn't fix the issue.
I'll go over the use case again for clarity. I maintain rust_xlsxwriter
which is a crate for creating Excel xlsx files which are a collection of mostly xml files in a zip container. I use zip/zip2 for the zipping part of this. A user submitted a failing test case where they were trying to create an Excel file with ~67k images of Japanese Unicode characters. This exceeds the 64k zip file limit and requires zip64 support. zip.rs
supports this correctly but uses a hardcoded value of 46 (Zip spec 4.6) where Excel expects to see 45.
If the follow DEFAULT_VERSION
is 46:
This causes the error shown above when Excel tries to load the file. If it is changed to 45 then everything works as expected.
In your patch c2fe207 you add enhanced logic to determine version_needed()
but that isn't used in fn write_central_and_footer()
.
I tested with zip.rs v2.0.0 released today (well done).
@jmcnamara I believe this will be fixed when e6b2290f70390e5df56d62d5efad4fa179268eb0 is released as part of version 2.1.0 tonight. Other commits that may be essential to the solution are cda4712153b629cbcdbeda87418c7a7319d27eb9, 0636bd74119e30e89399bd644f0e1239f419b6a0, f90bdf76b88a6218847b04d1813774f79251f41c (in rare corner cases) and 3afe549161cf769f845e50f8d8ec306bc76a8240. If I'm wrong, could you please reopen this issue and provide an example of a v2.1.0 or newer output that Excel fails to process?
I believe this will be fixed when e6b2290 is released as part of version 2.1.0 tonight. Other commits that may be essential to the solution are cda4712, 0636bd7, f90bdf7 (in rare corner cases) and 3afe549.
@Pr0methean I don't think any of those fix the issue. The files that demonstrate this issue are too big (250MB) to upload to GitHub but there is an example in the first comment of this issue from the original reporters repo:
Bad file: basic-full.xlsx
The issue seems to be in the Central Directly End block of the file which for the file above is the following from zipdetails -vv
:
CAA3A14 CAA3A17 0000004 50 4B 06 06 ZIP64 END CENTRAL DIR 06064B50 (101075792)
RECORD
CAA3A18 CAA3A1F 0000008 2C 00 00 00 Size of record 000000000000002C (44)
00 00 00 00
CAA3A20 CAA3A20 0000001 2E Created Zip Spec 2E (46) '4.6'
CAA3A21 CAA3A21 0000001 00 Created OS 00 (0) 'MS-DOS'
CAA3A22 CAA3A22 0000001 2E Extract Zip Spec 2E (46) '4.6'
CAA3A23 CAA3A23 0000001 00 Extract OS 00 (0) 'MS-DOS'
...
Excel has an issue with the Extract Zip Spec 2E (46) '4.6'
field. If I change that in a hex editor to 0x2D (Extract Zip Spec 2D (45) '4.5'
) the file will open (if the system has enough memory).
Note, the Created Zip Spec field doesn't have any effect on this.
Update, maybe this is fixed on main. I was looking at a version of the code that was slightly out of date. Let me recheck...
Sorry for the noise earlier. I reran with the latest main/v2.1.0 and it Excel can now open the file. For what it is worth here is the Central Directly End block:
CAF3AD7 0000004 50 4B 06 06 ZIP64 END CENTRAL DIR 06064B50
RECORD
CAF3ADB 0000008 2C 00 00 00 Size of record 000000000000002C
00 00 00 00
CAF3AE3 0000001 2E Created Zip Spec 2E '4.6'
CAF3AE4 0000001 00 Created OS 00 'MS-DOS'
CAF3AE5 0000001 14 Extract Zip Spec 14 '2.0'
CAF3AE6 0000001 00 Extract OS 00 'MS-DOS'
CAF3AE7 0000004 00 00 00 00 Number of this disk 00000000
CAF3AEB 0000004 00 00 00 00 Central Dir Disk no 00000000
CAF3AEF 0000008 06 57 01 00 Entries in this disk 0000000000015706
00 00 00 00
CAF3AF7 0000008 06 57 01 00 Total Entries 0000000000015706
00 00 00 00
CAF3AFF 0000008 A5 49 5C 00 Size of Central Dir 00000000005C49A5
00 00 00 00
CAF3B07 0000008 32 F1 52 0C Offset to Central dir 000000000C52F132
00 00 00 00
CAF3B0F 0000004 50 4B 06 07 ZIP64 END CENTRAL DIR 07064B50
LOCATOR
CAF3B13 0000004 00 00 00 00 Central Dir Disk no 00000000
CAF3B17 0000008 D7 3A AF 0C Offset to Central dir 000000000CAF3AD7
00 00 00 00
CAF3B1F 0000004 01 00 00 00 Total no of Disks 00000001
CAF3B23 0000004 50 4B 05 06 END CENTRAL HEADER 06054B50
CAF3B27 0000002 00 00 Number of this disk 0000
CAF3B29 0000002 00 00 Central Dir Disk no 0000
CAF3B2B 0000002 FF FF Entries in this disk FFFF
CAF3B2D 0000002 FF FF Total Entries FFFF
CAF3B2F 0000004 A5 49 5C 00 Size of Central Dir 005C49A5
CAF3B33 0000004 32 F1 52 0C Offset to Central Dir 0C52F132
CAF3B37 0000002 00 00 Comment Length 0000
Separate question, in this case should the Extract Zip Spec flag be 45 since there are more that 64k files or does that rely on also setting SimpleFileOptions.large_file(true)
?
Thanks for taking over maintainership of zip.rs. This is a duplicate of an issue that I raised against the older version: https://github.com/zip-rs/zip-old/issues/388
I have a library call
rust_xlsxwriter
that useszip.rs
to create xlsx files.The crate is reasonably well used and to date I haven't had any issues from Excel, either personally or from end users.
However, one user who is creating large (but not
large_file
large) 400MB files with lots of images reported an issue with the following file:Bad file: basic-full.xlsx
Excel complains about this almost immediately when it tries to open it:
I don't see any issues in the zipped XML data (the content) and if I unzip and rezip the file then Excel can open it:
Good file: basic-rezip.xlsx
Also, a Microsoft Open XML SDK validation tool complains about the zip file when I try to open the bad file:
So it looks to me like there is an issue with the zip container.
The bad file above was created with
zip.rs
v0.6.4 but I get the same result with v0.6.6. See the following for additional details: https://github.com/jmcnamara/rust_xlsxwriter/issues/51 Also, I'm second guessing that the issue is with large files. It could be related to the number of files in the zip: 87814.The file was created with the following zip.rs options:
And the following dependency in Cargo.toml:
Turning on the
large_file
ZIP64 option doesn't help.