JSON Canonicalization - Githubissues

We have several places which use JSON.stringify to turn an object into a string and expect that representation to identify the object uniquely and stably over time. Examples:

A PCD's JSON string is used as the key into an LRUCache of validation results.
A user's PCD Store is stringified and hashed to decide whether changes need to be uploaded.

These use cases really expect the JSON output to always be the same for the same contents. JSON.stringify doesn't quite guarantee that. According to this thread, since ES2015, JavaScript properties do have a stable iteration order, but it's dependent on insertion order. That will likely work for most of our use cases, but there could be cases where the same object is built in different ways and ends up with a different string representation.

When we have time to think about this, we should standardize on a way to canonicalize JSON. I've only done a tiny bit of Googling, but this json-canonicalize library, based on this RFC from 2020 looks promising.

One thing we need to consider here is the handling of very large integers. The JavaScript number type is a 64-bit float, and does not have enough precision to deal with some of the numbers we use as part of our cryptographic functions. The JavaScript BigInt type can do this, but JSON does not technically support this. A library like this does support creating JSON with BigInts even if this isn't strictly supported by the spec; we already use this other library for parsing such JSON.

We might want to switch to a different strategy, of always encoding bigints as strings, and never using a bare JSON parsing function for decoding them. Instead of doing bare JSON.parse(), we could pass the parsed value into a Zod schema which would handle the conversion of strings back into bigints.

proofcarryingdata / zupass

JSON Canonicalization #829