notpeter / apple-installer-checksums

Checksums of Mac OSX installer DMGs
846 stars 107 forks source link

Consider using dirhash h1 hash algorithm to generate checksume entire installer #108

Closed notpeter closed 3 years ago

notpeter commented 3 years ago

I had previously explored various methods for hashing the entire MacOS installer app bundle, but always came up short.

Recently I've come across a methodology which might be simple/elegant enough for our usage: Golang's DirHash

Specifically the h1 hash algorithm used by go mod module checksums and similarly in terraform. Checksums are of the format:

h1:44chars/base64enc/h1/prefixed/sha256sum=

From the documentation:

Hash1 is the "h1:" directory hash function, using SHA-256.

Hash1 is "h1:" followed by the base64-encoded SHA-256 hash of a summary prepared as if by the Unix command:

find . -type f | sort | sha256sum More precisely, the hashed summary contains a single line for each file in the list, ordered by sort.Strings applied to the file names, where each line consists of the hexadecimal SHA-256 hash of the file content, two spaces (U+0020), the file name, and a newline (U+000A).

File names with newlines (U+000A) are disallowed.

But it turns out sha256sum $(find . -type f | sort | sha256sum) isn't quite right, because sha256sum considers stdin in as data to hash, rather than a list of filenames to hash. Additionally my version of sha256sum can't output as base64 sha256 checksums only hex-encoded sha256 checksums.

Otherwise I think the premise is good: Step #1 Create a sorted plaintext inventory of all files and their sha256 checksums (in the form of sha256sum output):

sha256sum $(find . -type f | sort) > /tmp/hashlist.txt
head /tmp/hashlist.txt

1b0fd914537b3e29cebe21263bf94e79178ecfc2b2ccb3e97d69703f47e248d5  ./Contents/CodeResources
7ed0683ff4b8dc60a85649c8b77bf7b80f9812ca47e15d747919dac848d3f4c5  ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/IABridgeOSInstall
dbcc3847e26177acc22cb85739862a18c5b9184a3718d44a4e8c36dbd0a253f3  ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/Resources/BOSError.strings
e280fcb43363215079821a10f646eb1a67caaf6b4cf77e96b304767afebe3eeb  ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/Resources/Info.plist

Step #2 Take a sha256 checksum of that inventory of filenames and hashes. While I'd prefer hex encoding (3char prefix + 64 chars) there's an argument to be made for the more compact base64 encoding of the binary (3char prefix + 41 chars). There may be an easier way, but openssl is up for the job:

cat /tmp/hashlist.txt | openssl dgst -sha256 -binary |openssl enc -base64
1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY=

Or as a single combined command

cd "/Applications/Install macOS Big Sur.app/" 
echo "h1:$(sha256sum $(find . -type f | sort) | openssl dgst -sha256 -binary |openssl enc -base64)"

h1:1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY=

I haven't confirmed this output matches the specification of the h1 hash produced by go mod's dirhash, but I think it's algorithm is close. Assuming we can build a unix pipeline version which matches its output exactly it might be worthwhile to publish h1 hashes of the entire app file bundle and not just hashes of the dmg. Some potential issues:

Can anyone confirm that they get the same h1:1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY= output for the Install macOS Big Sur.app directory for 11.1 with the SharedSupport.dmg 2e2b0e06a4e592b8eca31e6e4eaeccee8442cc166daf36020a4496b838869283

notpeter commented 3 years ago

Closing as not-doing (for the moment).