Closed ikrivosheev closed 3 years ago
Thanks for the report. I think I've figured out what's happening, but I'm not sure yet which of several possible solutions is most correct.
Short version: These stream names are being stored unencoded, which is confusing the msi
crate—they appear in the list of streams, but when you try to read the stream, the msi
crate encodes the name and then doesn't find the encoded name in the underlying CFB file, so it thinks it doesn't exist.
Longer version: Stream names in the underlying CFB file are stored as UTF-16. To save space, stream names in MSI files are normally encoded, with up to two ASCII characters packed into each UTF-16 character (see https://github.com/mdsteele/rust-msi/blob/master/src/internal/streamname.rs). (Side note: I've never found anywhere where this scheme is documented. If there's a public doc that describes the encoding, I'd love to see it.)
However, there's an exception for the special "\u{5}SummaryInformation"
stream that appears in every MSI file, whose name is not encoded, and which the msi
crate special-cases (it is excluded from the list of streams, and its data is accessed instead through the summary_info()
method). It seems that "\u{5}DigitalSignature"
and "\u{5}MsiDigitalSignatureEx"
are also special streams whose names are not encoded. (Maybe that's what the \u{5}
prefix is supposed to signify? I don't know of anywhere that that's documented, either.)
I did some more digging, and at least found some more clues as to the meaning of the DigitalSignature
and MsiDigitalSignatureEx
data (this code comment in particular was illuminating). I think the best solution is for the msi
crate to exclude these from the list of streams, and provide higher-level methods for dealing with them, just as it already does for SummaryInformation
.
@mdsteele thank you for fixes, can you make small release?
I found strange behavior... Simple code:
Output:
msi file to reproduce: test.msi.zip.