Closed franziskuskiefer closed 3 years ago
The reasoning behind using the TLS syntax was that MLS implementations are already required to serialize/deserialize TLS-encoded structs. If you're really keen on JSON, I could probably be persuaded. If we were going to do that, though, I would suggest:
If we do JSON, we can probably also make the syntax spec more informal. There's no need to specify vector headers, for example. (JSON schema exists, but I'm not inclined to use it.) So we could have something like the following:
// TreeMathTestVector
{
"n_leaves": /* uint32 */,
"root": [ /* array of uint32 */ ],
"left": [ /* array of uint32 */ ],
"right": [ /* array of uint32 */ ],
"parent": [ /* array of uint32 */ ],
"sibling": [ /* array of uint32 */ ]
}
// EncryptionTestVector
{
"cipher_suite": /* uint16 */,
"n_leaves": /* uint32 */,
"encryption_secret": /* hex-encoded binary data */,
"sender_data_secret": /* hex-encoded binary data */,
"leaves": /* array of LeafSequence objects */
}
On the specific changes:
Only include the one root for the tree of n_leaves leaves.
OK with me, just seems like you have to generate more tests to get the same coverage.
Make all computed values optional because the computation for all of them might fail. Add None/null for the invalid cases.
These methods should all be infallible, and there are no invalid cases. What failure cases do you have in mind?
Encrypt a message every generation (either with the application or the handshake key).
Is your intent that the plaintext
and ciphertext
fields contain encoded MLSPlaintext
and MLSCiphertext
objects?
Add an additional sender_data_info check in order to test the sender data keys separately. This is mostly because I currently can't access them in openmls from the test. But I think because it's key material it's worth testing in any case.
I agree that it makes sense to test sender data key generation separately. I would suggest we just do it once, though, at the top level of EncryptionTestVector. By adding sender_data_ciphertext
, sender_data_key
, and sender_data_nonce
values there. (Or one field wrapping those three into a struct)
I moved the keys and messages into a separate struct to make the test vector more legible.
Huh, thought we had this before. Ok.
I replaced the CryptoValue to make it a little more concise.
Sure. If we go with JSON, it can get yet more concise.
The reasoning behind using the TLS syntax was that MLS implementations are already required to serialize/deserialize TLS-encoded structs. If you're really keen on JSON, I could probably be persuaded. If we were going to do that, though, I would suggest:
Yeah so I heard. I agree that TLS syntax would be more concise and it's sort of there already. But especially for large test vectors I find JSON way easier to handle. I'd still use the TLS serialization for MLS primitives (as done in the encryption test vector proposal) but make the test vector itself JSON. This way the MLS code and its serialization doesn't need to change and JSON is a dev-only dependency.
1. Representing binary data as a hex-encoded string (since that is more readable than an array/base64)
👍🏻 I agree that would be more readable.
2. Making the sample files pretty-formatted (with indentation, etc.)
With the change in 1 we can make the JSON files pretty (the byte arrays make formatted JSON ugly). serde_json outputs pretty printed JSON directly so this would be a one line change for me. Pretty printing blows up the file size though.
If we do JSON, we can probably also make the syntax spec more informal. There's no need to specify vector headers, for example. (JSON schema exists, but I'm not inclined to use it.) So we could have something like the following:
// TreeMathTestVector { "n_leaves": /* uint32 */, "root": [ /* array of uint32 */ ], "left": [ /* array of uint32 */ ], "right": [ /* array of uint32 */ ], "parent": [ /* array of uint32 */ ], "sibling": [ /* array of uint32 */ ] } // EncryptionTestVector { "cipher_suite": /* uint16 */, "n_leaves": /* uint32 */, "encryption_secret": /* hex-encoded binary data */, "sender_data_secret": /* hex-encoded binary data */, "leaves": /* array of LeafSequence objects */ }
I agree we shouldn't go for a JSON schema. I wouldn't mind a less strict definition of the test vectors.
On the specific changes:
Only include the one root for the tree of n_leaves leaves.
OK with me, just seems like you have to generate more tests to get the same coverage.
I revisited this. There was an oddity in our code that made me thing this would be bad. Let's go with a vector of roots as you proposed.
Make all computed values optional because the computation for all of them might fail. Add None/null for the invalid cases.
These methods should all be infallible, and there are no invalid cases. What failure cases do you have in mind?
Not all of them are infallible. I unified them to be all optional but actually only some of them are:
left
and right
fail if the index is a child. From the way I read the test vector description right now this actually happens ("left[i]
is the node index of the left child of the node with index i
in a tree with n_leaves
leaves")parent
and siblings
fail for the root node. Again a case I think happens with the current description (happens in my test cases)Encrypt a message every generation (either with the application or the handshake key).
Is your intent that the
plaintext
andciphertext
fields contain encodedMLSPlaintext
andMLSCiphertext
objects?
Yes, that's the idea. As I said above. I think every MLS object should be TLS encoded (no additional code needed).
Add an additional sender_data_info check in order to test the sender data keys separately. This is mostly because I currently can't access them in openmls from the test. But I think because it's key material it's worth testing in any case.
I agree that it makes sense to test sender data key generation separately. I would suggest we just do it once, though, at the top level of EncryptionTestVector. By adding
sender_data_ciphertext
,sender_data_key
, andsender_data_nonce
values there. (Or one field wrapping those three into a struct)
👍🏻 agreed. I added a wrapping struct.
I updated the test vector files in openmls:
with the following structure
{
"cipher_suite": /* uint16 */,
"n_leaves": /* uint32 */,
"encryption_secret": /* hex-encoded binary data */,
"sender_data_secret": /* hex-encoded binary data */,
"sender_data_info": {
"ciphertext": /* hex-encoded binary data */,
"secrets": {
"key": /* hex-encoded binary data */,
"nonce": /* hex-encoded binary data */,
},
},
"leaves": [
{
"generations": /* uint32 */,
"handshake_keys": [ /* array with `generations` handshake keys and nonces */
{
"key": /* hex-encoded binary data */,
"nonce": /* hex-encoded binary data */,
},
...
],
"application_keys": [ /* array with `generations` application keys and nonces */
{
"key": /* hex-encoded binary data */,
"nonce": /* hex-encoded binary data */,
},
...
],
"messages": [
/* array with `generations` TLS encoded MLSPlaintext/MLSCiphertext pairs. */
{
"plaintext": /* hex-encoded binary data */,
"ciphertext": /* hex-encoded binary data */,
},
...
],
}
]
}
{
"cipher_suite": /* uint16 */,
"root": [ /* array of uint32 */ ],
"left": [ /* array of option<uint32> */ ],
"right": [ /* array of option<uint32> */ ],
"parent": [ /* array of option<uint32> */ ],
"sibling": [ /* array of option<uint32> */ ]
}
@franziskuskiefer sounds like we're pretty much in agreement. Could you write a PR to update the spec? I can get the mlspp vector generation/processing updated then.
One thing that would be good to clarify in that PR: When you say option<uint32>
, does that mean that the value can be either a JSON number or null
?
I've started implementing test vectors for openmls (https://github.com/openmls/openmls/pull/299) (only tree math and encryption for now). I generally like the approach in test-vectors.md. But my versions differ slightly from the ones defined in test-vectors.md. Let's try to find a common ground that works for all implementations.
@bifurcation let me know what you think.
TreeMath
n_leaves
leaves.None/null
for the invalid cases.There's a sample file here.
Encryption
This has a couple more differences.
sender_data_info
check in order to test the sender data keys separately. This is mostly because I currently can't access them in openmls from the test. But I think because it's key material it's worth testing in any case.CryptoValue
to make it a little more concise.There's a sample file here.
Representation
Instead of TLS representation I used JSON. I think having readable test vectors is a good thing because you don't need to run a (potentially faulty) parser on them when debugging issues. If there's a good reason to have TLS encoded test vectors I could add it to openmls in addition to the JSON version.