twogood / unshield

Tool and library to extract CAB files from InstallShield installers
MIT License
340 stars 73 forks source link

Extraction failures on PowerPC with unshield 1.5.1 #139

Closed evanmiller closed 2 years ago

evanmiller commented 2 years ago

Hi, I'm trying to run an old PC game on an old Mac (32-bit PPC running 10.4.11 "Tiger"). I understand that big-endian machines are supported by this project. I am able to extract files just fine on an 64-bit Intel machine with macOS 10.6, but the Tiger box is failing to pull out any files. Here's part of the log:

$ unshield -D 3 -g Exe x /Volumes/CAESAR3/data1.cab
[unshield_fopen_for_reading:121] File /Volumes/CAESAR3/data1.hdr not found even case insensitive

[unshield_read_headers:233] Could not open .hdr file 1. Reading header from .cab file 1 instead.
[unshield_read_headers:296] Version 0x01000004 handled as major version 0
[unshield_get_cab_descriptor:82] Cabinet descriptor: 000011a6 00001469 00001469 00000004
[unshield_get_cab_descriptor:84] Directory count: 1
[unshield_get_cab_descriptor:85] File count: 90
[unshield_file_group_new:15] File group descriptor offset: 0000067a
[unshield_file_group_new:30] File group 0000067a first file = 13, last file = 86
[unshield_file_group_new:15] File group descriptor offset: 000006d7
[unshield_file_group_new:30] File group 000006d7 first file = 9, last file = 12
[unshield_file_group_new:15] File group descriptor offset: 00000734
[unshield_file_group_new:30] File group 00000734 first file = 87, last file = 87
[unshield_file_group_new:15] File group descriptor offset: 00000791
[unshield_file_group_new:30] File group 00000791 first file = 0, last file = 8
[unshield_file_group_new:15] File group descriptor offset: 000007ee
[unshield_file_group_new:30] File group 000007ee first file = 89, last file = 89
[unshield_file_group_new:15] File group descriptor offset: 0000084b
[unshield_file_group_new:30] File group 0000084b first file = 88, last file = 88
[unshield_fopen_for_reading:121] File /Volumes/CAESAR3/data2.hdr not found even case insensitive

[unshield_read_headers:233] Could not open .hdr file 2. Reading header from .cab file 2 instead.
[unshield_fopen_for_reading:121] File /Volumes/CAESAR3/data2.cab not found even case insensitive

[unshield_read_file_descriptor:58] File descriptor offset 13: 00001734
[unshield_read_file_descriptor:74] Name offset:      000010d0
[unshield_read_file_descriptor:75] Directory index:  00000000
[unshield_read_file_descriptor:76] Flags:            0004
[unshield_read_file_descriptor:77] Expanded size:    00000000
[unshield_read_file_descriptor:78] Compressed size:  00000000
[unshield_read_file_descriptor:79] Data offset:      00000000
[unshield_reader_open_volume:319] Open volume 1
[unshield_reader_open_volume:416] First file index = 0, last file index = 89
[unshield_reader_open_volume:418] First file offset = 00000000, last file offset = 7fffffff
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0x2, volume_bytes_left = 0x1945c
[unshield_reader_read:539] Trying to read 0x2 bytes from offset 001a94c6 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 103514
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0xea0a, volume_bytes_left = 0x1945a
[unshield_reader_read:539] Trying to read 0xea0a bytes from offset 001a94c8 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 43600
[unshield_file_save:853] read_bytes = 2794
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0x2, volume_bytes_left = 0xaa50
[unshield_reader_read:539] Trying to read 0x2 bytes from offset 001b7ed2 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 43598
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0xf2f3, volume_bytes_left = 0xaa4e
[unshield_reader_read:539] Trying to read 0xaa4e bytes from offset 001b7ed4 in volume 1
[unshield_reader_read:561] bytes_left = 18597, volume_bytes_left = 0
[unshield_reader_open_volume:319] Open volume 2
[unshield_fopen_for_reading:121] File /Volumes/CAESAR3/data2.cab not found even case insensitive

[unshield_reader_open_volume:327] Failed to open input cabinet file 2
[unshield_reader_read:576] Failed to open volume 2 to read 43598 more bytes
[unshield_file_save:830] Failed to read 62195 bytes of file 13 (C3_North.sg2) from input cabinet file 1
Failed to extract file 'C3_North.sg2'.
[unshield_read_file_descriptor:58] File descriptor offset 14: 0000175e
[unshield_read_file_descriptor:74] Name offset:      000010dd
[unshield_read_file_descriptor:75] Directory index:  00000000
[unshield_read_file_descriptor:76] Flags:            0004
[unshield_read_file_descriptor:77] Expanded size:    00000000
[unshield_read_file_descriptor:78] Compressed size:  00000000
[unshield_read_file_descriptor:79] Data offset:      00000000
[unshield_reader_open_volume:319] Open volume 1
[unshield_reader_open_volume:416] First file index = 0, last file index = 89
[unshield_reader_open_volume:418] First file offset = 00000000, last file offset = 7fffffff
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0x2, volume_bytes_left = 0x9bb1c
[unshield_reader_read:539] Trying to read 0x2 bytes from offset 001c2922 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 637722
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0xc20a, volume_bytes_left = 0x9bb1a
[unshield_reader_read:539] Trying to read 0xc20a bytes from offset 001c2924 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 588048
[unshield_file_save:853] read_bytes = 2754
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0x2, volume_bytes_left = 0x8f910
[unshield_reader_read:539] Trying to read 0x2 bytes from offset 001ceb2e in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 588046
[unshield_reader_read:527] unshield_reader_read start: bytes_left = 0xc08c, volume_bytes_left = 0x8f90e
[unshield_reader_read:539] Trying to read 0xc08c bytes from offset 001ceb30 in volume 1
[unshield_reader_read:561] bytes_left = 0, volume_bytes_left = 538754
[unshield_file_save:843] Decompression failed with code -3. bytes_to_read=49292, volume_bytes_left=538754, volume=1, read_bytes=49293
[unshield_file_save:846] HINT: Try unshield_file_save_old() or -O command line parameter!
Failed to extract file 'Briefing1a.555'.
...

The log is much longer, with a similar section for each file that can't be extracted. Following the hint and running the command again with -O, I get a seg-fault.

Any tips or pointers would be appreciated. I have installed unshield 1.5.1 via MacPorts. Happy to close this issue if 32-bit systems aren't supported.

twogood commented 2 years ago

What data*.cab and data*.hdr files do you have?

evanmiller commented 2 years ago

@twogood There is only data1.cab.

twogood commented 2 years ago

@evanmiller Same version of unshield on both systems? It used to work on big endian machines, I'm pretty sure 32-bit too.

evanmiller commented 2 years ago

@twogood Yes, same unshield version on both systems. Let me know if a log from the other machine would be helpful.

evanmiller commented 2 years ago

Digging into this some more, there are clearly some byte swap issues in play. I believe the underlying issue is that WORDS_BIGENDIAN was defined in the autotools universe, but is not anywhere in the CMake build. The macro is used here:

https://github.com/twogood/unshield/blob/c758ac017dda2f1735d385a360f82b673b898f4b/lib/internal.h#L108

The flag becomes relevant when letoh16 is invoked here: https://github.com/twogood/unshield/blob/87e9bed32e2f31b37e29c8374f65d60cd35d7b51/lib/file.c#L818

I can work around this manually with -DWORDS_BIGENDIAN=1 but hopefully this gives you enough to come up with a fix.

kratz00 commented 2 years ago

Should be an easy fix - https://cmake.org/cmake/help/v2.8.12/cmake.html#module:TestBigEndian

I am going to create a PR.

evanmiller commented 2 years ago

@kratz00 FWIW the docs indicate that TestBigEndian is deprecated in favor of CMAKE_C_BYTE_ORDER:

https://cmake.org/cmake/help/latest/module/TestBigEndian.html https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_BYTE_ORDER.html

kratz00 commented 2 years ago

I saw that too, this would mean to require at least CMake 3.20 (released Mar 23, 2021) Currently it is 2.8.12: https://github.com/twogood/unshield/blob/main/CMakeLists.txt#L1 I am not sure, 3.20 might be too advanced? :smile:

evanmiller commented 2 years ago

I am running CMake 3.22 on macOS 10.4 – which was last updated in 2007. So CMake 3.20 is "old news" as far as I am concerned! But I will defer to others here...

kratz00 commented 2 years ago

@evanmiller How can you have 3.22 - which was last updated in 2007, if 3.20 was released in 2021?

evanmiller commented 2 years ago

@kratz00 I meant macOS 10.4 was last updated in 2007...

kratz00 commented 2 years ago

Updating CMake should be possible in general, but 3.20 or newer is not available in my Linux distributions yet

twogood commented 2 years ago

Nice work! Would be nice to have it working out of the box in mainstream Linux, yes.

ryandesign commented 2 years ago

Testing for endianness at configure time is not ideal. macOS supports universal binaries built for multiple architectures. It is common to want to perform the configuration and build steps just once each, telling the build to use two architectures, for example by having -arch ppc -arch i386 in CFLAGS, CXXFLAGS, LDFLAGS. If you are relying on configure-time endianness checks, then the "native" architecture (ppc on ppc build machines; i386 on i386 build machines) will be built correctly while the "foreign" architecture will not. So instead, please use C preprocessor defines that have already been defined by your development environment to discover the endianness.

kratz00 commented 2 years ago

@ryandesign Thanks for your input. I agree, the proposed solution will only work if compiling for the "native" architecture. It will break cross compiling in case the target architecture differs from the build architecture.

I will think about a better solution.

ryandesign commented 2 years ago

Not sure how widespread or standardized this is but for example clang 5 and gcc 5 both define __BYTE_ORDER__ to __ORDER_LITTLE_ENDIAN__ and __LITTLE_ENDIAN__ to 1 on my Intel Mac. And one would expect __BYTE_ORDER__ to be __ORDER_BIG_ENDIAN__ and __BIG_ENDIAN__ to be 1 on PowerPC machines. So you might use either of those.

twogood commented 2 years ago

@ryandesign But according to CMake manual "Check if the target architecture is big endian or little endian." (my emphasis), shouldn't that be correct? Source: https://cmake.org/cmake/help/latest/module/TestBigEndian.html

evanmiller commented 2 years ago

Since there is only one line of problematic code, I have rewritten it to be endian-neutral in #143. This lets us get rid of all the byteswap cruft and sidestep any CMake version (and universal build) issues.

twogood commented 2 years ago

Was it only that line left @evanmiller ? 💯

ryandesign commented 2 years ago

But according to CMake manual "Check if the target architecture is big endian or little endian." (my emphasis), shouldn't that be correct? Source: https://cmake.org/cmake/help/latest/module/TestBigEndian.html

If you are trying to build universal (e.g. by having -arch ppc -arch i386 in CFLAGS, CXXFLAGS, LDFLAGS) there is no single target architecture. There is no answer to the question "what is the endianness of -arch ppc -arch i386"; it doesn't make sense to ask. It doesn't make sense the check endianness at configure time.

When doing a build for multiple architectures, clang (and the old Apple gcc-4.2 and llvm-gcc-4.2) will perform the compilation multiple times, once for each architecture, and glue the results together with lipo. Each of those separate builds will have correctly-set preprocessor constants that you can inspect for endianness, bitness, or other qualities.

twogood commented 2 years ago

Amazing @ryandesign ... it's a complex world... but is this scenario likely for little Unshield?

ryandesign commented 2 years ago

I don't know, but one might've thought any compilation on PowerPC Macs (discontinued in 2006) unlikely today, and yet that happened.

An all-at-once universal build like the one I described is what will happen if a MacPorts user were to install unshield on macOS today using the +universal variant (sudo port install unshield +universal).

If needed, a Portfile author can opt in to an alternate universal method in which the configure (cmake) and build (make) and destroot/staging (make install) phases are each run separately for each architecture (rather than all at once), and then the results are glued together by MacPorts using lipo. This solves the problem of configuration programs checking for endianness or bitness but it can bring other difficulties. For example, some build systems compile a program that is used at build time to generate something. If you were trying to compile i386/ppc universal on a PowerPC Mac, you would be able to run the ppc generation program during the ppc part but you would not be able to run the i386 generation program during the i386 part.

twogood commented 2 years ago

@ryandesign OK so __BYTE_ORDER__should be more reliable?

ryandesign commented 2 years ago

Using a preprocessor define (like __BYTE_ORDER__) at build time would avoid the problems caused by trying to determine the endianness at configure time. But you would need to find preprocessor defines that are available in each of the compilers you care about.

A quick search turned up this Stack Overflow discussion in which it says that "gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2" so that wouldn't work on Evan's Mac OS X Tiger system since Apple provided gcc 4.0.1 there.

You may need to check several different preprocessor defines and use whichever of them exist. It apparently varies by compiler and version because endianness is not addressed by the C standards. You might look at how cmake, autotools, or other projects implement their endianness checks and do something similar. Here is another Stack Overflow discussion with various suggestions.