jsummers / deark

A utility for file format and metadata analysis, data extraction, decompression, and image format decoding
https://entropymine.com/deark/
Other
170 stars 11 forks source link

Classic Mac OS Tome File support #71

Open jduerstock opened 7 months ago

jduerstock commented 7 months ago

Hello! Would it be possible for you to add Mac OS Tome File support to deark? It is at least partially documented in this project:

https://github.com/kainjow/TomeViewerX

Examples are in

https://archive.org/download/MacOS_8_Version_8.1_691-1912-A_Apple_Computer_Inc._1998

17050851 Jan 7 1998 './Full Install Pieces/Software Installers/System Software/Mac OS 8.1 Update/Tome 1' 48251710 Jun 26 1997 './Full Install Pieces/Software Installers/System Software/Mac OS 8/Installation Tome'

Thanks!

jsummers commented 7 months ago

Thanks for the suggestion. I'm undecided about whether it's worth pursuing. It's hard to say how difficult it would be, or what the chance of success is. Mac formats are usually more difficult than average, and I'm not really a Mac person.

The only info about the (alleged) compression scheme I've found is instacomp.py, which looks good at first, but all the "unimplemented" parts are concerning.

jduerstock commented 7 months ago

For what it's worth, the Mac MPW InstaCompOne compression tool is in the Mac OS Install 4.0.3 SDK on the November 95, January '95, August '95 and November '95 Developer CDs. I'll do whatever I can to help you along, if you have the time and inclination.

jsummers commented 7 months ago

Some Tome files are compressed, and others are uncompressed. I've got to the point where Deark might be able to extract from uncompressed files, though I wouldn't rely on it.

When InstaCompOne is used in Mac Resource format, the compressed resource data starts with a header that starts with signature bytes a8 9f 65 72. This is supported by the macresources Python script.

But if I extract a compressed resource to a file (Deark can do it using "-opt macrsrc:extractraw"), and run the script on it, I get an error like "failed Anon9 unimplemented".

The compressed Tome files I've looked at don't have the a8 9f 65 72 signature. That neither confirms nor denies that they use InstaCompOne compression.

Another thing: Tome files have what looks like CRC32 checksums, but I can't figure out the algorithm. It would be very nice to figure that out, before working on decompression.

That's about where things stand. Could be worse, but I'm not very optimistic.

jduerstock commented 7 months ago

So this is where things get extra weird. There was already a "compressed resource" mechanism in Mac OS before InstaCompOne ever came along, the most commonly used CODECs known as "greg" and "donn".

More detail is included here: https://preserve.mactech.com/articles/mactech/Vol.09/09.01/ResCompression/index.html https://formats.kaitai.io/compressed_resource/

The long and short of it is that the header you've found is independent of InstaCompOne and really just raw data that's the beginning of the stream, I believe.

I'm going to take more of a stab at seeing what I can reverse engineer over the next couple of days. And see if I can ping a few people who might know a bit more than me.

Thanks for your work so far.

jduerstock commented 7 months ago

I made a couple of test files for you using the InstaCompOneTool. The "CLI" output from the info commands looks like this:

jd@mac:~/mac$ mps InstaCompOneTool -l -o ic1t.tome
 ID  File Name                           Type    Crtr      DF Org   DF Cmp  Svd  RF Org   RF Cmp  Svd   Region   Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
  1 'InstaCompOneTool'                  'MPST'  'MPS '     106364    39168  64    92098    43860  53        0        0
jd@mac:~/mac$ mps InstaCompOneTool -ll -o ic1t.tome
 ID  File Name                         Version       Type    Crtr       Creation Date       Modification Date     Finder Flags    DF Org   DF Cmp  Svd  RF Org   RF Cmp  Svd    TOT Org  TOT Cmp  Svd   Region   Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
  1 'InstaCompOneTool'                  1.1.1       'MPST'  'MPS '  12/31/99 11:58:58 PM  12/31/99 11:58:58 PM   aiblscriislc0d   106364    39168  64    92098    43860  53      198462    83028  59        0        0
jd@mac:~/mac$ mps InstaCompOneTool -l -o ic1t.nc.tome
 ID  File Name                           Type    Crtr      DF Org   DF Cmp  Svd  RF Org   RF Cmp  Svd   Region   Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
  1 'InstaCompOneTool'                  'MPST'  'MPS '     106364   106364   0    92098    92098   0        0        0
jd@mac:~/mac$ mps InstaCompOneTool -ll -o ic1t.nc.tome
 ID  File Name                         Version       Type    Crtr       Creation Date       Modification Date     Finder Flags    DF Org   DF Cmp  Svd  RF Org   RF Cmp  Svd    TOT Org  TOT Cmp  Svd   Region   Loc rev.
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
  1 'InstaCompOneTool'                  1.1.1       'MPST'  'MPS '  12/31/99 11:57:39 PM  12/31/99 11:57:39 PM   aiblscriislc0d   106364   106364   0    92098    92098   0      198462   198462   0        0        0

The ic1t.nc.tome file is with no compression. ic1t.tome is with compression. The original file is stored in the ic1t.zip file, where the .rdump file is a sort of DeRez'ed version of the resource fork. Hopefully this helps at least a little bit.

ic1t.tome.zip ic1t.zip

jduerstock commented 6 months ago

Also, this is a total shot in the dark, but this might be the checksum algorithm. Not sure how good your m68k assembly is, but I think arg_0 (d4) is the number of bytes to checksum, arg_4 (a2) is a pointer to a byte array, and arg_8 (d5) is the "previous" value.

ROM:10001C1E sub_10001C1E:                           ; CODE XREF: sub_10000F4C+82
ROM:10001C1E                                         ; sub_10001BEC+1E
ROM:10001C1E
ROM:10001C1E arg_0           =  8
ROM:10001C1E arg_4           =  $C
ROM:10001C1E arg_8           =  $10
ROM:10001C1E
ROM:10001C1E                 link    a6,#0
ROM:10001C22                 movem.l d3-d7/a2,-(sp)
ROM:10001C26                 move.l  arg_0(a6),d4
ROM:10001C2A                 movea.l arg_4(a6),a2
ROM:10001C2E                 move.l  arg_8(a6),d5
ROM:10001C32                 move.l  d5,d7
ROM:10001C34                 moveq   #0,d0
ROM:10001C36                 move.l  d0,d6
ROM:10001C38
ROM:10001C38 loc_10001C38:                           ; CODE XREF: sub_10001C1E+42
ROM:10001C38                 move.l  d6,d0
ROM:10001C3A                 cmp.l   d4,d0
ROM:10001C3C                 bcc.s   loc_10001C62
ROM:10001C3E                 move.l  d7,d3
ROM:10001C40                 lsl.l   #8,d3
ROM:10001C42                 move.l  d7,d0
ROM:10001C44                 moveq   #$18,d1
ROM:10001C46                 lsr.l   d1,d0
ROM:10001C48                 andi.l  #$FF,d0
ROM:10001C4E                 add.l   d0,d3
ROM:10001C50                 movea.l a2,a0
ROM:10001C52                 adda.l  d6,a0
ROM:10001C54                 move.b  (a0),d0
ROM:10001C56                 ext.w   d0
ROM:10001C58                 ext.l   d0
ROM:10001C5A                 eor.l   d0,d3
ROM:10001C5C                 move.l  d3,d7
ROM:10001C5E                 addq.l  #1,d6
ROM:10001C60                 bra.s   loc_10001C38
ROM:10001C62 ; ---------------------------------------------------------------------------
ROM:10001C62
ROM:10001C62 loc_10001C62:                           ; CODE XREF: sub_10001C1E+1E
ROM:10001C62                 move.l  d7,d0
ROM:10001C64                 movem.l (sp)+,d3-d7/a2
ROM:10001C68                 unlk    a6
ROM:10001C6A                 rts
ROM:10001C6A ; End of function sub_10001C1E

or

uint FUN_10001c1e(uint param_1,int param_2,uint param_3)

{
  uint uVar1;

  for (uVar1 = 0; uVar1 < param_1; uVar1 = uVar1 + 1) {
    param_3 = (int)(short)*(char *)(uVar1 + param_2) ^ (param_3 >> 0x18) + param_3 * 0x100;
  }
  return param_3;
}
jsummers commented 6 months ago

Yeah, that is the bulk of the checksum algorithm, at least for non-compressed files. There's still a missing ingredient, possibly the initial value of param_3. If I XOR the reported and calculated checksums, the correct result would be four 0x00 bytes. What I get is four bytes, each of which is either 0x00 or 0xff, seemingly at random. That's actually good enough for my purposes. But it's weird. (Or maybe I just made a coding error.)

I'm hoping that the checksum is derived from the decompressed data (not the compressed data). My best guess is that it is, but I don't have enough evidence to be sure.

jsummers commented 6 months ago

I've implemented a decoder for about 90% of IC1 format. But what's left could be challenging.

Someday I'll probably try to set up a way to run InstaCompOneTool. In the meantime, a few more test files might help (an uncompressed file, preferably >128kb, and a compressed Tome file containing it).

jduerstock commented 6 months ago

For what it's worth, https://github.com/elliotnunn/mps is what I use to run InstaCompOneTool under Linux. It's relatively easy and straightforward, at least in comparison to the other ways I know of.

jduerstock commented 6 months ago

p80.zip

Included is a ~330k m68k binary, and the non-compressed and compressed tomes of it.

Let me know if you have any more requests.

jsummers commented 5 months ago

I've fixed the issues related to your latest sample file. A lot more files work now. But plenty still don't. I expect most of the remaining failures to happen in the first ~50kb of the file, so my request for files >128kb is no longer relevant.

The main reasons this is proving difficult are (1) certain compression parameters change depending on how many bytes have been decompressed so far, and I don't know exactly where the boundaries are, and (2) IC1 has an unusually large number of ways to encode the compressed data, and I don't know how you would get the compression utility to use the one you want to test.

So... I can probably fix any file that fails, given a copy of the decompressed file. And I'm willing to. But it might not lead to fixing all the bugs.

jduerstock commented 5 months ago

This one has some successes and some failures.

powertalk.zip

Thanks again for all your work on this.

jsummers commented 5 months ago

powertalk.zip doesn't seem to include the original uncompressed files. Without that, I can't figure out where it failed.

jduerstock commented 5 months ago

Apologies, I should've included it before. I'll try to remember for future reports.

pt.nc.zip

jsummers commented 5 months ago

I've made some progress. Decompression usually works, but not always.

Note that the resource forks in the compressed vs. uncompressed PowerTalk files aren't the same, so failures there are hard to analyze.