Our Hash for big integers does not follow the 1.52 specification

JohnLCaron commented 2 years ago

From the spec:

"Hashing large integers. Large integers in ElectionGuard are multi-precision integers such as the constants p and q and integers modulo p and q, i.e. integers that lie in the ranges {0, 1, . . . , p − 1} and {0, 1, . . . , q − 1}. For hashing such an integer, it is written in its hexadecimal form with capital letters for the digits A through F representing digits beyond 9. If the number of digits is odd, a single leading 0 character is added to make the number of digits even. The string used for input into the hash function is this utf-8 encoded hexadecimal representation."

Currently, we add as many zeroes as is needed to get to 32/512 digits, rather than just adding a leading 0 if number of digits is odd.

This is done in Element.byteArray(), eg

override fun byteArray(): ByteArray = element.toByteArray().normalize(512)

Not sure if theres an advantage of doing it that way. @danwallach any opinions?

JohnLCaron commented 2 years ago

From the spec:

"Hash function outputs. All outputs of SHA-256 are arrays of 32 bytes, which are interpreted as 256-bit integers in big endian format. In ElectionGuard all hash function outputs are transformed into such integers and reduced modulo the prime q = 2^256 − 189. This is also how they are input into another hash computation according to the description for large integers above."

So technically we are supposed to do a mod q on hashes. I think this only matters for input to other hashes.

danwallach commented 2 years ago

On the number of bytes being hashed, I don't see any problem with the leading zero thing. It keeps the string being hashes a little bit shorter, so there will be a modest performance improvement to go along with following the spec.

The bit about the hash function outputs being treated as elements-mod-q is a leftover. That's the spec following the code. We made a deliberate decision, early on, to have a separate UInt256 type in our "experimental" Kotlin code. Among other things, it also meant that hashing of all those manifest-related data structures didn't have to know anything about elements-mod-q, which had significant benefits in cleaning up our code.

My thoughts: we should agree with the former (i.e., exactly how an Element is hashed) and we should push back on the latter (i.e., we should define the hash output as a UInt256 allowing the hash function to be defined independently of the value of Q). Of course, there's an obvious mapping from UInt256 to ElementModQ if and when it's needed.

danwallach commented 2 years ago

Oddly relevant thread on a closely related topic: how ECDSA deals with the same problem. https://mastodon.social/@filippo/109341577110608322

Luckily, we're not constrained by a spec as immovable as ECDSA.

JohnLCaron commented 2 years ago

It looks like this will get addressed in the next spec:

"Without going into all the details here, I’m currently rewriting the hashing completely. And hashes of big integers will be defined as the hash of the corresponding big endian byte arrays, everything is padded with zero bytes to the left to fixed length. It should be very easy to use the HACL (and other) conversion functions. I am also planning to have hash function outputs be byte arrays (or full 256-bit integers) and only reduce mod q when doing computations in the exponent." - Michael

danwallach commented 2 years ago

This is good. Our early design decision to build everything around a separate UInt256 type for hashing is turning out to be the right decision.

JohnLCaron commented 1 year ago

Out hashing is now 2.0 compliant, which specifies hashing in much better detail.

votingworks / electionguard-kotlin-multiplatform

Our Hash for big integers does not follow the 1.52 specification #210