tafia / quick-protobuf

A rust implementation of protobuf parser
MIT License
446 stars 82 forks source link

Proposed API change: move to getters and setters #235

Open snproj opened 1 year ago

snproj commented 1 year ago

Proposal

I'd like to hear if there are any opinions on moving to a generated getter/setter API for fields in messages, enums etc. rather than simply making fields public and accessing them directly.

Why?

Current implementation
> Currently, our generated structs simply expose all fields as public, and the user simply reads from and assigns to them directly. > > There is also no concept of field presence (i.e. which fields have been explicitly set, and which have not)

I'm trying to implement a more compliant Proto2/Proto3 distinction (among other things, attempting to fix #219 and introduce field presence). These are concepts that (in my relatively inexperienced view) live mainly in how the user interacts with the field value, and thus might be implemented most intuitively/tidily through an abstraction layer there.

I'm not saying it's impossible to maintain the current interface while adding these concepts (I'm trying that out right now to see how it goes), but I think doing so is messier, and if we want to make this kind of API change, the earlier the better.

On a more minor point, I also feel that if I were a user, it would make things clearer. For example, if I have a Proto3 field msg.x with a default value of 0, setting that field to 0 should mean that it is not serialized. set_x(0) would make me more aware of these possible side effects than just x = 0.

Also, if at some point we add codegen to generate other common protobuf methods (such as some kind of clear_field(), hazzers, etc.), getters and setters would fit right into this kind of paradigm.

lexxvir commented 1 year ago

Hi @snproj !

I heavily use quick-protobuf in the embedded (no_std) project and like the approach with public fields in the generated structs. That approach is significantly less noisy than getters/setters one.

I compared Google's C++ generated code vs quick-protobuf.

Though I use only the Proto2. For Proto3 (when all fields are optional) the setters/getters approach maybe more convenient.

nerdrew commented 1 year ago

I find working with the struct fields very nice. The objects are just structs. I don't need to worry about a new API when passing the structs around. An added benefit is that I add serde::Deserialize to most of them, and It Just Works. I don't need to care that the struct came from a proto or JSON on the wire. My app treats them the same.

I hope we can come up with a way to make this work with the existing struct fields without resorting to setters.

snproj commented 1 year ago

Thanks for the feedback! One concern I have can be shown in the following example:

syntax = "proto2";
message A {
    optional int32 a = 1 [default=3];
}

This generates some Rust struct with field pub a: i32, initialized to a value of 3 under our current system.

With field presence, we should have some way of checking (usually through a generated hazzer method has_a()) whether a has been explicitly set or not, even if the explicitly-set value is the default value.

This affects serialization too; if a is explicitly set to 3, it should be serialized, even though the number the user reads from the field is still the same as the default.

This is summarized in the following chain of events:

  1. We initialize the struct, say with A::default().
  2. We should be able to read a and get the default value 3, even though it's not been set;
  3. we should be able to find out that a has not been explicitly set;
  4. if we serialize now without setting a, a should not appear on the wire.
  5. We should be able to set a to 3 explicitly; let us do so now.
  6. If we read a now, we should still get the same value 3;
  7. we should be able to find out that a has been explicitly set;
  8. now if we serialize, we should see a being serialized to 3 on the wire.

Currently, I can't think of a good way to enable a user to do this without the use of getters and setters.

(Also, 3 is just a random number, obviously; the point is just that it's a default value.)

Let me know if I've got some concepts wrong above / if you have any ideas! There's a good chance I'm just missing something obvious.

nerdrew commented 1 year ago

Hypothetical generated code using structs:

In the protobuf runtime crate:

enum OptionalField<T> {
  Set(T), // T is the set value
  Unset(T), // T is the default value
}

Generated struct

struct A {
  pub a: OptionalField(i32),
}

We could implement a good portion of the Option enum API for protobuf::OptionalField to make it feel natural.

We could also add a serde feature to the crate that adds serde annotations to OptionalField to make the JSON, etc serialization look "normal" too.

snproj commented 1 year ago

That sounds like a good idea! I'm currently experimenting with implementing this or something similar for all singular fields (as SettableField), since it isn't only optional fields that have this set/unset dynamic.

It might not be ideal to allow the user to manipulate the Unset variant directly though, since that would essentially be changing his own default value on the fly. I'm wondering how to make the Unset variant immutable w.r.t its contents.

What would really be nice is if this were currently possible:

pub enum SettableField<T, const N: T> {
    Set(T),
    Unset(N), // should hold the default value
}

but alas it seems pretty far away.

Even if we make SettableField a struct instead of an enum (so we can make Unset into some private field), the user can simply instantiate a new SettableField instance with a different unset value and assign it to the generated struct field. We know that the user must have access to the constructor for SettableField because the generated structs must have too (in order to function in the first place).

It does seem, that the only way to have an immutable default value for each field known to the code while still allowing the user to mutate the struct is to hardcode it with codegen somehow; as of now I can't seem to think of a way though. Do you think we should just trust the user not to touch the Unset variant, reconsider the enum approach, or is there something I'm missing haha

thomaseizinger commented 1 year ago

It might be a bit hacky but you can make a newtype that implements Deref and DerefMut. Within the DerefMut implementation, you can set a dirty flag and essentially track, whether that value has at any point be dereferenced for a mutable reference which I think should mean that the user set it to something.

snproj commented 1 year ago

Just an update, over the course of thinking/testing I've found some problems that arise with these suggestions:

Set/Unset

DerefMut

It's an interesting idea! However I feel that using deref_mut for this side-effect would be quite a transparent thing that might trip up users, or need them to watch out for deref coercion

In the worst case, there might be functions that don't really need to have things mutable that do anyway. Especially if that function requests the inner type as a parameter (not the newtype itself), it will be automatically set. A colleague of mine made this for example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=11b86a16ab9bcf3f6318020d98a1da61

snproj commented 1 year ago

Another Proposal

We can actually pretty comfortably live without getters and setters; I think the most crucial thing they would do (that is hard to replace) is to ensure that reading a variable will yield the correct default if that variable is not set. So...

I propose that we just put the user in charge of that 😁 it's not hard; we will provide an easy way to do so that should always work. But the user will need to remember to do this.

Keep in mind that the following example is the most complicated case (Proto2 optional). Proto3 optionals are simpler, and other types are just plain types and don't use Option.

Something like:

syntax = "proto2";
message Thing {
    optional int32 a = 1 [default = 3];
}

Will produce:

struct Thing {
    pub a: Option<i32>,
}

impl Thing {
    const DEFAULT_A: i32 = 3;
}

And using this in a Rust program:

fn main() {
    let thing = Thing::default();
    assert_eq!(thing.a, None); // optional field initialized as None

    do_something_with_a(thing.a.unwrap_or(Thing::DEFAULT_A)); // when we want to read from field a, we need to remember to supply the default as alternative
}

Individual Cases

Proto2 required
Represented by plain type. Custom defaults will only influence the initial value of the field when the struct is initialized; it will not be "remembered" anywhere else in the API. - Hazzing: Not possible - Getting: Read directly, it's a plain variable - Setting: Set directly, it's a plain variable - Clearing: No such option Technically, Google's own implementation *does* provide ways for hazzing and clearing. However, my assumption is that this is only meant to be used when a struct is just initialized, to give an initial value before the user starts assigning values, not during serialization (since these fields are REQUIRED in sound messages after all).
Proto2 optional
Represented by `Option`. The `Option` represents the actual data present; if it's `None`, it means nothing has been set. It's the user's responsibility to read variables in a way that will return the correct defaults if the value is unset (see "Getting"). Custom defaults will generate a predictably-named associated constant in the `impl` block representing the custom default. - Hazzing: `is_some()` - Getting: `unwrap_or(default_name)`, where `default_name` is the predictably-named constant mentioned above - Setting: Set to `Some(new_val)` - Clearing: Set to `None`
Proto3 default
Represented by plain type. Custom defaults not allowed in Proto3. - Hazzing: Not possible - Getting: Read directly, it's a plain variable - Setting: Set directly, it's a plain variable - Clearing: No such option
Proto3 optional
Represented by `Option`. The `Option` represents the actual data present; if it's `None`, it means nothing has been set. It's the user's responsibility to read variables in a way that will return the correct defaults if the value is unset (see "Getting"). Custom defaults not allowed in Proto3. Notably, the Proto3 defaults align with the Rust ones. Therefore, instead of using `unwrap_or(something)`, we can use `unwrap_or_default()`. - Hazzing: `is_some()` - Getting: `unwrap_or_default()` - Setting: Set to `Some(new_val)` - Clearing: Set to `None`

Isn't this almost getters and setters?

Yes, it would be very easy to transform this into a getter and setter API; this approach basically sidesteps it at the last moment to provide direct access to types instead.

If people think it's a good idea, maybe a build flag or magic comment might be possible to additionally autogenerate them for convenience.

nerdrew commented 1 year ago

I like this last proposal. I also think adding a flag or magic comment to generate getters that will return default values is also a good idea.

thomaseizinger commented 1 year ago

I am also in favor of the last proposal.

In addition to the constant, can we also generate a getter a_or_default that is implemented as thing.a.unwrap_or(Thing::DEFAULT_A)? There is a good chance people will need that so generating it for them makes sense. That doesn't necessarily conflict with public fields IMO.

Fishrock123 commented 1 year ago

I think the getters is the wrong approach. I came to this library for performance and simplicity.

I think if you need this more advanced api stuff using one of the other libraries should be fine.

Since we are talking about things which are implicit to how protobuf is supposed to work, I think those can just be represented directly without extra fuss.

Which is to say, i think optional should just be Option in proto3. If there is a protobuf default and we are constructing a protobuf object, None should become the default. If we are reading a protobuf object and the data source did not have a property and there is a default it should just be the default. That's what defaults literally are as translated to rust and this seems to be the library that sticks the most to "just translate it literally to rust" which is a useful library to have.

I do not think that there has to be a way to tell "was this data the default or was it set in the data" in this library.