Closed katelynsills closed 1 year ago
We want to be able to get a canonical CID for the json such that even json that starts out as differently formatted (or with keys ordered differently) produces the same CID.
Sounds like we might want to use JSON Canonicalization Scheme, and apply it to the JSON before signing and verifying.
Alternatively, if the the JSON is just being stored as a string or bytes in the key-value-store, then its formatting and key order is fixed, and so it's those bytes that are being signed, and the underlying format and parsing can be ignored.
Awesome, thanks for the canonicalization library! I would hope that Ceramic is doing this already, but I'd have to test to see if they are. But yes, we definitely want JSON that only differs due to formatting or ordering to map to the same CID. And that would require a pre-processing step before signing.
Alternatively, if the the JSON is just being stored as a string or bytes in the key-value-store, then its formatting and key order is fixed, and so it's those bytes that are being signed, and the underlying format and parsing can be ignored.
Yes, after we store it, it is fixed. But that's not enough. If we later have the same data and get a different CID when we hash it that time, we won't be able to deduplicate and we won't know that it is the same. Similarly, an external user like a digital forensic expert won't be able to produce the same results as us, unless they happen to have exactly the same JSON. So we definitely want a canonicalization step as part of the process that we can tell others about for determinism's sake.
As discussed in #4, BSON will be used to store values. This means JCS cannot be used as it only applies to JSON, and BSON has no canonicalization standard.
One solution is to sort the keys of the JavaScript object before (like this), and then the BSON serialization will preserve this order. This ties our canonicalization to the JavaScript sorting algorithm, but that is what JCS does anyway so it might not be a big deal. Likely some testing is needed to make sure this sorting is actually preserved as expected.
As per #8, Ceramic Network's dag-jose library should allow us to encrypt private values, but we also need it to sign attestations and then add the signed attestations separately to an IPFS node that we run.
We want to be able to get a canonical CID for the json such that even json that starts out as differently formatted (or with keys ordered differently) produces the same CID. Getting a CID per attestation will also allow us to potentially make attestations about other attestations in later stages.
Open Questions
Given that the integrity backend currently uses AuthSign servers, how can we reconcile these two approaches? Must we use AuthSign? What would we be missing if we didn't?