Open jduerstock opened 7 months ago
Thanks for the suggestion. I'm undecided about whether it's worth pursuing. It's hard to say how difficult it would be, or what the chance of success is. Mac formats are usually more difficult than average, and I'm not really a Mac person.
The only info about the (alleged) compression scheme I've found is instacomp.py, which looks good at first, but all the "unimplemented" parts are concerning.
For what it's worth, the Mac MPW InstaCompOne compression tool is in the Mac OS Install 4.0.3 SDK on the November 95, January '95, August '95 and November '95 Developer CDs. I'll do whatever I can to help you along, if you have the time and inclination.
Some Tome files are compressed, and others are uncompressed. I've got to the point where Deark might be able to extract from uncompressed files, though I wouldn't rely on it.
When InstaCompOne is used in Mac Resource format, the compressed resource data starts with a header that starts with signature bytes a8 9f 65 72. This is supported by the macresources Python script.
But if I extract a compressed resource to a file (Deark can do it using "-opt macrsrc:extractraw"), and run the script on it, I get an error like "failed Anon9 unimplemented".
The compressed Tome files I've looked at don't have the a8 9f 65 72 signature. That neither confirms nor denies that they use InstaCompOne compression.
Another thing: Tome files have what looks like CRC32 checksums, but I can't figure out the algorithm. It would be very nice to figure that out, before working on decompression.
That's about where things stand. Could be worse, but I'm not very optimistic.
So this is where things get extra weird. There was already a "compressed resource" mechanism in Mac OS before InstaCompOne ever came along, the most commonly used CODECs known as "greg" and "donn".
More detail is included here: https://preserve.mactech.com/articles/mactech/Vol.09/09.01/ResCompression/index.html https://formats.kaitai.io/compressed_resource/
The long and short of it is that the header you've found is independent of InstaCompOne and really just raw data that's the beginning of the stream, I believe.
I'm going to take more of a stab at seeing what I can reverse engineer over the next couple of days. And see if I can ping a few people who might know a bit more than me.
Thanks for your work so far.
I made a couple of test files for you using the InstaCompOneTool. The "CLI" output from the info commands looks like this:
jd@mac:~/mac$ mps InstaCompOneTool -l -o ic1t.tome
ID File Name Type Crtr DF Org DF Cmp Svd RF Org RF Cmp Svd Region Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
1 'InstaCompOneTool' 'MPST' 'MPS ' 106364 39168 64 92098 43860 53 0 0
jd@mac:~/mac$ mps InstaCompOneTool -ll -o ic1t.tome
ID File Name Version Type Crtr Creation Date Modification Date Finder Flags DF Org DF Cmp Svd RF Org RF Cmp Svd TOT Org TOT Cmp Svd Region Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
1 'InstaCompOneTool' 1.1.1 'MPST' 'MPS ' 12/31/99 11:58:58 PM 12/31/99 11:58:58 PM aiblscriislc0d 106364 39168 64 92098 43860 53 198462 83028 59 0 0
jd@mac:~/mac$ mps InstaCompOneTool -l -o ic1t.nc.tome
ID File Name Type Crtr DF Org DF Cmp Svd RF Org RF Cmp Svd Region Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
1 'InstaCompOneTool' 'MPST' 'MPS ' 106364 106364 0 92098 92098 0 0 0
jd@mac:~/mac$ mps InstaCompOneTool -ll -o ic1t.nc.tome
ID File Name Version Type Crtr Creation Date Modification Date Finder Flags DF Org DF Cmp Svd RF Org RF Cmp Svd TOT Org TOT Cmp Svd Region Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
1 'InstaCompOneTool' 1.1.1 'MPST' 'MPS ' 12/31/99 11:57:39 PM 12/31/99 11:57:39 PM aiblscriislc0d 106364 106364 0 92098 92098 0 198462 198462 0 0 0
The ic1t.nc.tome file is with no compression. ic1t.tome is with compression. The original file is stored in the ic1t.zip file, where the .rdump file is a sort of DeRez'ed version of the resource fork. Hopefully this helps at least a little bit.
Also, this is a total shot in the dark, but this might be the checksum algorithm. Not sure how good your m68k assembly is, but I think arg_0 (d4) is the number of bytes to checksum, arg_4 (a2) is a pointer to a byte array, and arg_8 (d5) is the "previous" value.
ROM:10001C1E sub_10001C1E: ; CODE XREF: sub_10000F4C+82
ROM:10001C1E ; sub_10001BEC+1E
ROM:10001C1E
ROM:10001C1E arg_0 = 8
ROM:10001C1E arg_4 = $C
ROM:10001C1E arg_8 = $10
ROM:10001C1E
ROM:10001C1E link a6,#0
ROM:10001C22 movem.l d3-d7/a2,-(sp)
ROM:10001C26 move.l arg_0(a6),d4
ROM:10001C2A movea.l arg_4(a6),a2
ROM:10001C2E move.l arg_8(a6),d5
ROM:10001C32 move.l d5,d7
ROM:10001C34 moveq #0,d0
ROM:10001C36 move.l d0,d6
ROM:10001C38
ROM:10001C38 loc_10001C38: ; CODE XREF: sub_10001C1E+42
ROM:10001C38 move.l d6,d0
ROM:10001C3A cmp.l d4,d0
ROM:10001C3C bcc.s loc_10001C62
ROM:10001C3E move.l d7,d3
ROM:10001C40 lsl.l #8,d3
ROM:10001C42 move.l d7,d0
ROM:10001C44 moveq #$18,d1
ROM:10001C46 lsr.l d1,d0
ROM:10001C48 andi.l #$FF,d0
ROM:10001C4E add.l d0,d3
ROM:10001C50 movea.l a2,a0
ROM:10001C52 adda.l d6,a0
ROM:10001C54 move.b (a0),d0
ROM:10001C56 ext.w d0
ROM:10001C58 ext.l d0
ROM:10001C5A eor.l d0,d3
ROM:10001C5C move.l d3,d7
ROM:10001C5E addq.l #1,d6
ROM:10001C60 bra.s loc_10001C38
ROM:10001C62 ; ---------------------------------------------------------------------------
ROM:10001C62
ROM:10001C62 loc_10001C62: ; CODE XREF: sub_10001C1E+1E
ROM:10001C62 move.l d7,d0
ROM:10001C64 movem.l (sp)+,d3-d7/a2
ROM:10001C68 unlk a6
ROM:10001C6A rts
ROM:10001C6A ; End of function sub_10001C1E
or
uint FUN_10001c1e(uint param_1,int param_2,uint param_3)
{
uint uVar1;
for (uVar1 = 0; uVar1 < param_1; uVar1 = uVar1 + 1) {
param_3 = (int)(short)*(char *)(uVar1 + param_2) ^ (param_3 >> 0x18) + param_3 * 0x100;
}
return param_3;
}
Yeah, that is the bulk of the checksum algorithm, at least for non-compressed files. There's still a missing ingredient, possibly the initial value of param_3. If I XOR the reported and calculated checksums, the correct result would be four 0x00 bytes. What I get is four bytes, each of which is either 0x00 or 0xff, seemingly at random. That's actually good enough for my purposes. But it's weird. (Or maybe I just made a coding error.)
I'm hoping that the checksum is derived from the decompressed data (not the compressed data). My best guess is that it is, but I don't have enough evidence to be sure.
I've implemented a decoder for about 90% of IC1 format. But what's left could be challenging.
Someday I'll probably try to set up a way to run InstaCompOneTool. In the meantime, a few more test files might help (an uncompressed file, preferably >128kb, and a compressed Tome file containing it).
For what it's worth, https://github.com/elliotnunn/mps is what I use to run InstaCompOneTool under Linux. It's relatively easy and straightforward, at least in comparison to the other ways I know of.
Included is a ~330k m68k binary, and the non-compressed and compressed tomes of it.
Let me know if you have any more requests.
I've fixed the issues related to your latest sample file. A lot more files work now. But plenty still don't. I expect most of the remaining failures to happen in the first ~50kb of the file, so my request for files >128kb is no longer relevant.
The main reasons this is proving difficult are (1) certain compression parameters change depending on how many bytes have been decompressed so far, and I don't know exactly where the boundaries are, and (2) IC1 has an unusually large number of ways to encode the compressed data, and I don't know how you would get the compression utility to use the one you want to test.
So... I can probably fix any file that fails, given a copy of the decompressed file. And I'm willing to. But it might not lead to fixing all the bugs.
This one has some successes and some failures.
Thanks again for all your work on this.
powertalk.zip doesn't seem to include the original uncompressed files. Without that, I can't figure out where it failed.
Apologies, I should've included it before. I'll try to remember for future reports.
I've made some progress. Decompression usually works, but not always.
Note that the resource forks in the compressed vs. uncompressed PowerTalk files aren't the same, so failures there are hard to analyze.
Hello! Would it be possible for you to add Mac OS Tome File support to deark? It is at least partially documented in this project:
https://github.com/kainjow/TomeViewerX
Examples are in
https://archive.org/download/MacOS_8_Version_8.1_691-1912-A_Apple_Computer_Inc._1998
17050851 Jan 7 1998 './Full Install Pieces/Software Installers/System Software/Mac OS 8.1 Update/Tome 1' 48251710 Jun 26 1997 './Full Install Pieces/Software Installers/System Software/Mac OS 8/Installation Tome'
Thanks!