theupdateframework / rust-tuf

Rust implementation of The Update Framework (TUF)
https://crates.io/crates/tuf
Apache License 2.0
172 stars 32 forks source link

Current serde/shims don't parse Vec<u8>, only hex/base64/pem strings #86

Open heartsucker opened 7 years ago

heartsucker commented 7 years ago

This is would prevent us from having a serialization format that uses parses bytes directly (and not bytes-that-are-a-string (I think)). This should work:

struct Foo<Vec<u8>>

impl DeserializeOwned for Foo {
  fn deserialize<D: Deserializer<'de>>(de: D) -> ::std::result::Result<Self, D::Error> {
    let res: Vec<u8> = Deserialize::deserialize(de)
    match res {
      Ok(x) => Foo(x),
      Err(_) => {
        let x: String = Deserialize::deserialize(de)?;
        Foo(HEXLOWER.decode(x.as_bytes())?)
    }
  }
}
lucab commented 7 years ago

I have some doubts related to this. There are currently some fields which contain binary content that at some point may be serialized/deserialized to a text containr, e.g. the value field of a Signature struct into a JSON string. TUF spec doesn't seem to mandate any encoding for such a string field, however it uses hexlower in all examples. Your library recently switched to base64, meaning that it doesn't interoperate with other implementations using hexlower and that even the example above is obsolete (NB: I'm not debating the base64 choice).

So my doubts are:

heartsucker commented 7 years ago

should the TUF spec mandate which encoding to use for binary field in textual formats

This again comes back to the idea that TUF is moving away from rigid definitions for everthing. Even as it stood at the time Notary as implemented, it didn't say that an implementation must support ed25519 keys or even sha256, so I could have written a TUF lib that still perfectly followed the spec but wouldn't be able to talk to Notary which also perfectly followed the spec.

I chose b64 because hex is ~50% more data to send per hash/keyid/sig.

There are some obvious reasons for TUF to not specify things rigidly.

  1. Current HSMs don't always support ed25519 keys
  2. Embedded devices might not support RSA keys (memory restrictions)
  3. Embedded devices might not support JSON (memory restrictions)

I think trying to make a lib that is fully general purpose that can handle all cases is too complex. This lib is intended to be used as both the client and server lib with no guarantees about interop. It might behoove me to document this fact better in both the readme and rustdocs. I do make some opinionated decisions that deviate from the non-security-critical designs in TUF (key encoding, for example).

If this is seen as an issue with the TUF spec, you should file something upstream. Or say @trishankkarthik's name 3 times in front of a mirror and see if he appears behind you.

should the TUF spec be augmented to specify a complex object with type+content so that multiple formats can be consumed?

I'm not sure what you mean by this? JSON should it have both hexlower and b64 as different fields? If so, I'm going to say no since there is already a huge amount of metadata being transferred with TUF. I'm actually going to strip the default metadata generators to only use user defined hash types and not "all" since that would further reduce the amount of data transferred. See #106.

heartsucker commented 7 years ago

Also as a note, this is even more annoying because writing metadata is more complex. How would I take the bytes of a hash and encode them as both a string and a byte array? Each DataInterchange would need its own definition of Serialize per type. This would be... painful, perhaps.

lucab commented 7 years ago

Thanks for the answer, you are touching many points that I omitted to avoid going out of scope, but those insights are appreciated. Getting to specific topics:

It might behoove me to document this fact better in both the readme and rustdocs.

Yes, it would be nice to point it out in the top level crate doc that there is no interoperability with anything else, on purpose, as per-spec. It would also be nice to collect all your opinionated tweaks and their rationale in a markdown file in here.

I'm not sure what you mean by this? JSON should it have both hexlower and b64 as different fields?

No. Just for reference, this was suggesting an internally tagged enum (in serde terms):

{
  "type": "hexlower",
  "sig": "$hexblob"
}

If this is seen as an issue with the TUF spec, you should file something upstream. Or say @trishankkarthik's name 3 times in front of a mirror and see if he appears behind you.

I think I will refrain from. Both you and @trishankkarthik are upstream and well aware of encoding details. The fact that in the whole current section 4 there is not a single MUST related to file format/encoding/typing/field-naming and that content is specified by "examples" suggest to stop it here.

heartsucker commented 7 years ago

Yes, it would be nice to point it out in the top level crate doc that there is no interoperability with anything else, on purpose, as per-spec. It would also be nice to collect all your opinionated tweaks and their rationale in a markdown file in here.

I'll make a note in #109 to ensure I cover all the data formats and how everything works and not just the code you need to make the lib work.

No. Just for reference, this was suggesting an internally tagged enum (in serde terms):

This feels like adding cruft/bloat to a new lib that doesn't yet need to support backwards compatibility.

... there is not a single MUST related to file format/encoding/typing/field-naming ...

I actually see this a deficiency of the spec because at some point those were all mandatory (or so I believe), and I tried to increase the precision of the language, clean up the spec a bit, and turn in into an RFC here: https://github.com/theupdateframework/tuf/pull/455. (note: that PR is dated by now)

lucab commented 7 years ago

I'll make a note in #109 to ensure I cover all the data formats and how everything works and not just the code you need to make the lib work.

Thanks, that would have helped in understanding the differencies/caveats before digging into the code :)

This feels like adding cruft/bloat to a new lib that doesn't yet need to support backwards compatibility.

I agree, as far as there is no need to interop and the chosen encoding is documented somewhere. I'm not suggesting any of those further, just clarifying the meaning.

I actually see this a deficiency of the spec

As above, I'm not planning to drag this discussion there, so feel free to report it upstream to the spec if you deem it is a deficiency there (or tell me explicitly if I should do instead, but I'm in general not involved in TUF processes as you). For reference, I previously saw your stale RFC PR and would love to see the spec evolving in that direction.

heartsucker commented 7 years ago

Thanks, that would have helped in understanding the differencies/caveats before digging into the code :)

Honestly, I didn't actually expect people to start using this so much yet. :)

or tell me explicitly if I should do instead

Nah, it's fine. I actually support the flexibility and lack of interop for the reasons mentioned.

I previously saw your stale RFC PR and would love to see the spec evolving in that direction.

If you like the RFC and would like to see the spec RFC'd, you could let the homies over that-a-way know that you prefer the RFC format over the .txt format in hopes that we could catch it up to the current spec and move it that direction.

tarcieri commented 7 years ago

I'll just note the question of "how to encode bytes in JSON" (i.e. hex vs base64) is one of the things I was trying to address in TJSON:

https://www.tjson.org/

It allows for both, with mandatory type tags selecting the encoding, with base64url being the default.