kyz / libmspack

A library for some loosely related Microsoft compression formats, CAB, CHM, HLP, LIT, KWAJ and SZDD.
https://www.cabextract.org.uk/libmspack/
169 stars 45 forks source link

Support Intra Package Delta (IPD) format, aka PA30 #3

Open A-Shahbazi opened 8 years ago

A-Shahbazi commented 8 years ago

I tried extracting this cab file using cabextract but it wasn't extracted correctly. (the folders aren't extracted) I used following command from windows and it extracted as it should've.

‍‍expand -F:* <cabfile name>.cab C:<target_dir>

​I'm using cabextract 0.6 and libmspack 0.5

Thanks for your efforts, Ali

kyz commented 8 years ago

I'm afraid this cabinet file is extracted as correctly as possible.

The example file you've linked to is this Microsoft update, and it's not a regular cabinet file holding ready-to-use files, but instead it's a container for another format: "Intra-Package Delta" (IPD) compressed updates. As you point out, this is handled by the EXPAND command in the Microsoft Deployment Tools.

It might be interesting to add support for this type of file in libmspack, but as far as I know, this format has not been reverse-engineered, and is likely patented. Read this StackExchange question about the format for more information. I'm not sure how Microsoft managed to get a patent on a compressed binary diff, because binary diffs have been around since the 1970s, but there you go.

Inside the cabinet file is a file called _manifest_.cix.xml, which lists all files in the update. Some are normal files, others have type="PA30" which indicates they're built out the compressed files:

From your example:

<File id="40" name="x86_microsoft-windows-usp_31bf3856ad364e35_6.1.7601.22171_none_af477f18d00f9c82\usp10.dll" length="626688" time="129980361992060000" attr="32">
  <Hash alg="SHA1" value="7c06a9c4ba68d068a1e0350ab3695dbc76f0a4b9"/>
  <Delta>
    <Source type="PA30" name="0"><Hash alg="SHA1" value="4da896c0f6e1184cf83a7247b55744cb80800dde"/></Source>
  </Delta>
</File>
<File id="39" name="x86_microsoft-windows-usp_31bf3856ad364e35_6.1.7601.18009_none_af119411b6b203d9\usp10.dll" length="626688" time="129980331030140000" attr="32">
<Hash alg="SHA1" value="c11329ac4f7704a3d100e9c96a980037d1b90bb7"/>
<Delta>
  <Source type="PA30" name="1"><Hash alg="SHA1" value="fabbdc598ee743d6c2084b5269f7f9f0bfb1d3c9"/></Source>
    <Basis file="40"/>
</Delta>

The file x86_microsoft-windows-usp_31bf3856ad364e35_6.1.7601.22171_none_af477f18d00f9c82\usp10.dll is stored entirely in the file "0", which is itself compressed in this new "PA30" format.

The next file, x86_microsoft-windows-usp_31bf3856ad364e35_6.1.7601.18009_none_af119411b6b203d9\usp10.dll" is created by extracting the previous file, then applying a patch to it, which is stored in the file "1", compressed in the PA30 format.

To be honest, the way in which this works (first unpack this entire file, then use this diff to convert it into another file) sounds like it wouldn't be a good fit for libmspack; the formats it covers are in the form "open this file, expand a compressed filestream to another file", whereas this format requires you to completely unpack a (potentially enormous) file, then follow a set of diffing instructions to modify it to become another file. Someone may want to write a third-party tool that uses libmspack to decompress the cabinet-file layer, but would likely need to write files to disk, copy and edit them on disk, rather than act on them as a stream. But who knows? Maybe libmspack can do the "decompress PA30 files" part.