maximumspatium / ResDecompress

Decompression of compressed MacOS resources.
MIT License
6 stars 2 forks source link

Documentation for the Mac OS resource compression algorithms? #2

Open dgelessus opened 4 years ago

dgelessus commented 4 years ago

Hi, thank you for putting this on GitHub! I was recently looking into how Macintosh resource de-/compression works, and there is almost no information about this online, so your implementation is very helpful.

I was wondering how you managed to implement the compression formats. Do you know of any proper documentation (official or unofficial) for the compression algorithms supported by Mac OS? I could only find one online source about resource compression, which is an article by Alysis Software Corporation that was published on their website and in MacTech, and that only documents how to write a custom decompressor, not how to use the standard Mac OS decompressors.

I'm asking because I'm working on a Python library/tool to read (and eventually also write) resource forks, and I recently added support for compressed resources. I couldn't find any documentation about the Mac OS compression algorithms (and I hadn't found your repo yet), so I had to reverse-engineer them from scratch. It would be helpful if I had proper documentation to compare with, because in a few places I'm not sure if my implementation is correct, and I don't know the meaning of some of the header fields.

maximumspatium commented 4 years ago

Hi,

there is no official documentation for the resource compression used in the legacy Mac OS. It looks like that belongs to the features Apple wanted to hide from the public. My implementation of the GreggyBits algorithm is based on the leaked source of the original decompressor by Gregg Marriott. It's written in 68k assembler and lacks the compressor. I reimplemented everything in Python and added a compressor too. It was pretty easy.

System 7 utilizes another compression algorithm by Donn Denman. The original decompression code (68k) has been archived here. Mac OS 8 system file includes a PowerPC implementation of it in a 'ncmp' resource. My project doesn't currently support it but a basic Python implementation has been already laid out.

The support for the InstaCompOne algo used in Mac OS 8/9 has been coded from scratch based on my own RE work. There is neither documentation nor source code for it. Just a huge 68k code resource ('dcmp' #3). After I found out that InstaCompOne is a variant of Deflate, its Python decompressor was fairly straightforward. A compressor would be a lot harder to implement so I leave it out for the moment being.

It would be helpful if I had proper documentation to compare with, because in a few places I'm not sure if my implementation is correct, and I don't know the meaning of some of the header fields.

We're using a TDD-based system that runs a custom C code running on MacOS9 under QEMU. That simply C programm asks Mac OS to decompress a resource and saves decompressed data into a file. Later we decompress the same data using Elliot's Python tool and compare the results. This way it's easy to ensure that the reimplementation is correct...

What do you think about joining our development efforts to create an ever better Python tool for Mac OS resources? FYI, the project that includes my code is located here: https://github.com/elliotnunn/macresources

Cheers Max

dgelessus commented 4 years ago

Thank you for the information!

I wasn't aware that the System 7 source code was leaked - that is definitely useful to know. (I'm usually a bit wary about working based on leaked source code, but in this case the code is almost 30 years old, and I can easily find uploads of it that have been around for over two years. I think it's safe to say that Apple doesn't care too much anymore.)

It also confirms what I already guessed: the few header fields I wasn't sure about are either completely unused or only used by the original compressors, so they can be safely ignored by decompressors and custom compressor implementations.

We're using a TDD-based system that runs a custom C code running on MacOS9 under QEMU [...]

Heh, that's much more advanced than what I was doing. I used Mini vMac to run ResEdit and manually copied all the compressed resources into a new file, which decompresses them as a side effect. Then I mounted the emulator disk image on my host machine and compared the compressed and decompressed resources in a hex editor. (I have the advantage that my host machine is a modern Mac, which still supports native resource forks and can mount HFS volumes read-only. This makes it very easy to transfer files with resource forks out of the emulator.)

What do you think about joining our development efforts to create an ever better Python tool for Mac OS resources? FYI, the project that includes my code is located here: https://github.com/elliotnunn/macresources

That would be a good idea - especially since both libraries are written in Python. It might not be straightforward to merge the two libraries though, since they have somewhat different feature sets. I think our use cases might also be a bit different as well?

In particular, my library is currently focused on reading and analyzing resource forks, but doesn't support writing at all yet. Its CLI tool is somewhat complex and meant for human users rather than scripts (though it can also output data in machine-readable form as raw data, hex, or basic DeRez syntax). On the other hand, the goal of MacResources seems to be providing replacements for Rez and DeRez with compatible input/output syntax and command line flags - is that correct?

It should be possible though to make both CLI tools use the same Python API - regardless of the input/output syntax and CLI flags, the internal resource file reading/writing process will be identical.

Also, if it's of any use, a while back I wrote a PLY-based parser for Rez/DeRez syntax. It supports all of the more advanced features of the Rez language (including resource type definitions, Rez builtin functions, and preprocessor directives). It's only a parser - it reads resource definition files/headers and produces an AST - but it could be used as a base for a full Rez/DeRez implementation in Python. I also need to clean it up a little - I got hung up with supporting some really strange undocumented Rez behavior, which is not used by any reasonable resource definition files and complicates the parser a lot.

jduerstock commented 4 years ago

If you haven't seen it before: http://preserve.mactech.com/articles/mactech/Vol.09/09.01/ResCompression/index.html

Edit: Maybe I should have read the full discussion first. Oops, never mind.