I had previously explored various methods for hashing the entire MacOS installer app bundle, but always came up short.
Recently I've come across a methodology which might be simple/elegant enough for our usage: Golang's DirHash
Specifically the h1 hash algorithm used by go mod module checksums and similarly in terraform. Checksums are of the format:
h1:44chars/base64enc/h1/prefixed/sha256sum=
From the documentation:
Hash1 is the "h1:" directory hash function, using SHA-256.
Hash1 is "h1:" followed by the base64-encoded SHA-256 hash of a summary prepared as if by the Unix command:
find . -type f | sort | sha256sum
More precisely, the hashed summary contains a single line for each file in the list, ordered by sort.Strings applied to the file names, where each line consists of the hexadecimal SHA-256 hash of the file content, two spaces (U+0020), the file name, and a newline (U+000A).
File names with newlines (U+000A) are disallowed.
But it turns out sha256sum $(find . -type f | sort | sha256sum) isn't quite right, because sha256sum considers stdin in as data to hash, rather than a list of filenames to hash. Additionally my version of sha256sum can't output as base64 sha256 checksums only hex-encoded sha256 checksums.
Otherwise I think the premise is good:
Step #1 Create a sorted plaintext inventory of all files and their sha256 checksums (in the form of sha256sum output):
sha256sum $(find . -type f | sort) > /tmp/hashlist.txt
head /tmp/hashlist.txt
1b0fd914537b3e29cebe21263bf94e79178ecfc2b2ccb3e97d69703f47e248d5 ./Contents/CodeResources
7ed0683ff4b8dc60a85649c8b77bf7b80f9812ca47e15d747919dac848d3f4c5 ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/IABridgeOSInstall
dbcc3847e26177acc22cb85739862a18c5b9184a3718d44a4e8c36dbd0a253f3 ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/Resources/BOSError.strings
e280fcb43363215079821a10f646eb1a67caaf6b4cf77e96b304767afebe3eeb ./Contents/Frameworks/OSInstallerSetup.framework/Versions/A/Frameworks/IAESD.framework/Versions/A/Frameworks/IABridgeOSInstall.framework/Versions/A/Resources/Info.plist
Step #2 Take a sha256 checksum of that inventory of filenames and hashes. While I'd prefer hex encoding (3char prefix + 64 chars) there's an argument to be made for the more compact base64 encoding of the binary (3char prefix + 41 chars). There may be an easier way, but openssl is up for the job:
cd "/Applications/Install macOS Big Sur.app/"
echo "h1:$(sha256sum $(find . -type f | sort) | openssl dgst -sha256 -binary |openssl enc -base64)"
h1:1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY=
I haven't confirmed this output matches the specification of the h1 hash produced by go mod's dirhash, but I think it's algorithm is close. Assuming we can build a unix pipeline version which matches its output exactly it might be worthwhile to publish h1 hashes of the entire app file bundle and not just hashes of the dmg. Some potential issues:
Not sure how best to handle directory prefixes (e.g. leading ./ or /Applications/Install MacOS Big Sur.app). Output of find . -type f includes leading ./ dirhash.
empty directories are ignored. Go mod gets away with it because you can't checkout empty directories from git.
symlinks are ignored. At first glance of the golang code I think it treats symlinks like regular files and reads/ hashes their content as if it were a file which is feels reasonable sane.
Can anyone confirm that they get the same h1:1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY= output for the Install macOS Big Sur.app directory for 11.1 with the SharedSupport.dmg 2e2b0e06a4e592b8eca31e6e4eaeccee8442cc166daf36020a4496b838869283
I had previously explored various methods for hashing the entire MacOS installer app bundle, but always came up short.
Recently I've come across a methodology which might be simple/elegant enough for our usage: Golang's DirHash
Specifically the
h1
hash algorithm used bygo mod
module checksums and similarly in terraform. Checksums are of the format:From the documentation:
But it turns out
sha256sum $(find . -type f | sort | sha256sum)
isn't quite right, because sha256sum considers stdin in as data to hash, rather than a list of filenames to hash. Additionally my version of sha256sum can't output as base64 sha256 checksums only hex-encoded sha256 checksums.Otherwise I think the premise is good: Step #1 Create a sorted plaintext inventory of all files and their sha256 checksums (in the form of sha256sum output):
Step #2 Take a sha256 checksum of that inventory of filenames and hashes. While I'd prefer hex encoding (3char prefix + 64 chars) there's an argument to be made for the more compact base64 encoding of the binary (3char prefix + 41 chars). There may be an easier way, but openssl is up for the job:
Or as a single combined command
I haven't confirmed this output matches the specification of the h1 hash produced by go mod's dirhash, but I think it's algorithm is close. Assuming we can build a unix pipeline version which matches its output exactly it might be worthwhile to publish h1 hashes of the entire app file bundle and not just hashes of the dmg. Some potential issues:
./
or/Applications/Install MacOS Big Sur.app
). Output offind . -type f
includes leading./
dirhash.Can anyone confirm that they get the same
h1:1Gs9W0zvI2p8XnYsPTMKzRkjowxgHYVooz6kUeRbRYY=
output for theInstall macOS Big Sur.app
directory for 11.1 with theSharedSupport.dmg 2e2b0e06a4e592b8eca31e6e4eaeccee8442cc166daf36020a4496b838869283