jamesmunns / postcard-rpc

An RPC layer for postcard based protocols
Apache License 2.0
93 stars 22 forks source link

Introduce new v2 hash #46

Closed jamesmunns closed 1 month ago

jamesmunns commented 2 months ago

This introduces a new, NOT BACKWARDS COMPATIBLE, version two of the schema hashing algorithm.

This makes two changes:

I plan to experiment with this in this PR, and if merged, would be added as a non-default option for the 0.7.x release train. In the future (e.g. 0.8.x), hashv2 would become either the only supported algorithm, or would become the default with an opt-in feature for the legacy v1 hashing.

One thing I plan to look at is if I can relax any + 'static/"Owned" bounds on various traits or methods. The hope would be to allow non-'static sending/serialization, but still require 'static bounds for receiving/deserialization. This would allow a no-std device to send &[u8], while allowing the std host to receive as a Vec<u8>.

I need to write more in depth about this, as it takes the schema system from "strict nominal + strict structural typing" to "strict structural typing + partial nominal typing", where "partial nominal typing" is defined that the names of struct fields and enum variants are still considered, but the names of types themselves are NOT considered.

Closes #23

jamesmunns commented 2 months ago

I may want to make this slightly less flexible, for example, this would allow types like struct Wrapper1(u8); and struct Wrapper2(u8); to also produce the same hash, which I am unsure if is desirable or not.

zeenix commented 2 months ago
  • We no longer hash the name of a type, as proposed by @zeenix. This allows for potential "type punning" when two types are represented on the wire in the same way. For example, heapless::String and std::string::String should produce the same schema hash, or &[u8] and Vec<u8>.

Awesome!

One thing I plan to look at is if I can relax any + 'static/"Owned" bounds on various traits or methods.

That would be also super nice.

The hope would be to allow non-'static sending/serialization, but still require 'static bounds for receiving/deserialization.

Hmm.. if you were to have an intermediate raw binary type that you read the bytes into (reading the full message bytes first, is something an RPC should do anyway, IMHO) and then allowing to deserialize from that, you can allow deserialization to reference types as well (this would be particular great to strings). Something like this.

This would allow a no-std device to send &[u8], while allowing the std host to receive as a Vec<u8>.

Definitely! However, I'd imagine most users do both on both sides. Still very useful if the sender drops the 'static bound already. :+1:

jamesmunns commented 2 months ago

Hmm.. if you were to have an intermediate raw binary type that you read the bytes into (reading the full message bytes first, is something an RPC should do anyway, IMHO) and then allowing to deserialize from that, you can allow deserialization to reference types as well (this would be particular great to strings)

Both postcard and postcard-rpc both do this already. Serde doesn't have "pausable" deserialization (for example, async methods), so it is necessary to already have all data or be blocking. Postcard does allow deserialization with borrowed values (strings and byte slices), but some of postcard-rpc's interfaces have a more strict 'static bound.

I'll have to remember why I put 'static bounds on many of these items, in the past I've found that allowing borrowed deser in cases where part of the process is managed by the "stack" (e.g. postcard-rpc's dispatch machinery) ends up being challenging to compose. If there are specific places you think it can be relaxed, I'm open to issues to discuss.

jamesmunns commented 2 months ago

I may want to make this slightly less flexible, for example, this would allow types like struct Wrapper1(u8); and struct Wrapper2(u8); to also produce the same hash, which I am unsure if is desirable or not.

Thinking on this, I'm maybe slightly more okay with this.

If these types are a top level structure, they still will be used with a path, if someone swaps Temperature(f32) with Humidity(f32), and the path is sensors/temperature, then I don't think I can help them. Similarly if these types are a field in a struct, the name of the field is still hashed (and hashing is sensitive to ordering, e.g. hash(a) + hash(b) != hash(b) + hash(a)), which avoids this case:

struct Example {
    temp: Temperature(f32),
    humidity: Humidity(f32),
}

// field reordering! *does* still change the schema in with v2 hashing

struct Example {
    humidity: Humidity(f32),
    temp: Temperature(f32),
}

Open to more "this is bad" or "this is okay" examples.