unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.78k stars 270 forks source link

Audit runtime hashing/serialization with regard to `Nat` vs. `Int` #4290

Open dolio opened 1 year ago

dolio commented 1 year ago

The Haskell runtime represents Nat and Int values as pseudo data types, with constructors that contain unboxed machine integers, but refer to builtin Reference values, unlike real data types which always contain hashes. This decision has influenced the serialization and hashing of runtime values, which (I believe) distinguish between these two sorts of values.

However, in the scheme implementation, it's significantly more efficient to just represent these values as just scheme numbers. This leaves it unclear which values are supposed to be positive Ints vs. Nats, however (obviously negative Ints can be figured out). Many aspects don't really care about keeping track of the distinction between these sorts of values, as well. For instance, Nat and Int built-ins don't really need to check whether their arguments are represented in exactly the expected way, because they're just operating on the unboxed data.

So, it seems like the problem that the scheme representation has is just that decisions have been made based on the particular way the Haskell interpreter represents things. And there doesn't really seem to be a fundamental problem with, for instance, an Int value having the same hash as the corresponding Nat value. It just doesn't right now, and the scheme implementation would need to meet that specification.

So, instead, we should probably revise the specification of hashing/serialization so that it does not mandate exactly the representation that the Haskell interpreter uses, because otherwise other runtimes must carry the same type information, which can have significant costs (at least, absent optimizations like worker/wrapper that allow for locally omitting the information).

pchiusano commented 1 year ago

They do indeed have different hashes:

    2 | > blake2b_256 +1
          ⧩
          0xs2dcf5ad733d105f557d6280a2f8202893f8219f6e2a88d06d71e2b7d35887adf

    3 | > blake2b_256 1
          ⧩
          0xs8f141eba4d9e62720169e2611ed21dcb8d03976f133ecaa66503794442a0f0c0