tokio-rs / prost

PROST! a Protocol Buffers implementation for the Rust Language
Apache License 2.0
3.77k stars 489 forks source link

Generate nullable types without `Option`: Opt in required fields #908

Open alexheretic opened 1 year ago

alexheretic commented 1 year ago

In many cases nested proto contracts have many nullable types that are actually always logically invalid if null.

In these cases it can be quite painful to use the generated types directly. Similarly painful having to write wrapper types and conversions from generated types.

More nesting = more pain.

Example

syntax = "proto3";

message Foo { ... }
message Bar {
    string id = 1;
    // Always required in a valid message processed by the end server
    Foo foo = 2;
}
// generated
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Bar {
    #[prost(string, tag = "1")]
    pub id: ::prost::alloc::string::String,
    #[prost(message, optional, tag = "2")]
    pub foo: ::core::option::Option<Foo>,
}

foo is nullable in the contract so the generated code correctly wraps it in an Option.

What I'd like instead

In the example imagine a common case where I consider any message missing foo to be invalid. I'd prefer to avoid:

I'd like to have a way to configure code gen to have no option, prost instead produces an error on decoding if the protobuf bytes are missing this field.

Perhaps it could work by opting in, in a similar way to extern_path, e.g. fail_decode_if_missing(proto_path) (or better name)

fn foo() {
    let mut prost_build = prost_build::Config::new();
    // configure `Bar` to fail decoding if `Bar::foo` is missing
    // meaning we can generate `Bar::foo` as `Foo` instead of an option
    prost_build.fail_decode_if_missing(".Bar.foo");
    prost_build
        .compile_protos(&["proto/foo.proto"], &["proto"])
        .unwrap();
}
// generated
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Bar {
    #[prost(string, tag = "1")]
    pub id: ::prost::alloc::string::String,
    #[prost(message, not-optional, tag = "2")]
    pub foo: Foo,
}

Derived code would require foo is present, failing to decode otherwise.

E.g. the following test would succeed.

#[test]
fn foo_is_required() {
    use prost::Message;
    let msg = Bar::decode(&[][..]);
    assert!(msg.is_err(), "expected fail, got {msg:?}");
}
alexheretic commented 12 months ago

As an experiment I tried making use of the existing "required" code path for this.

I replaced the generated code with non-optional code using "required" via extern_path

// lib.rs
include!(concat!(env!("OUT_DIR"), "/_.rs"));

pub use nonnull::*;

mod nonnull {
    use super::Foo;

    #[derive(Clone, PartialEq, prost::Message)]
    pub struct Bar {
        #[prost(string, tag = "1")]
        pub id: String,
        /// Always required in a valid message processed by the end server
        #[prost(message, required, tag = "2")] // replaced "optional" attr with "required"
        pub foo: Foo,
    }
}
// build.rs
fn main() {
    prost_build::Config::default()
        .extern_path(".Bar", "crate::nonnull::Bar")
        .compile_protos(&["proto/foo.proto"], &["proto"])
        .unwrap();
}

However, it seems there is no actual validation happening at decode Foo::default() is decoded when absent.

thread 'decode' panicked at 'expected fail, got Ok(Bar { id: "", foo: Foo { a: "" } })'

A couple of outputs from this

zmrocze commented 11 months ago

I think it's impossible to stay consistent with decoding of "simple" values (like ints), due to protobuf3 semantic.
I'm just today confronted with weird choices of protobuf3, so correct me if I'm wrong, but I think that ints are not wrapped in Option only because prost returns the default int (0?) on absence of a field. It's protobuf3's demand , not a prost shortcoming, that information about what data was missing and what data was assigned a default value disappears at deserialization.

And it's the same about structs, right? That is when client rust code sends None, the message is indistinguishable from messages where the field is absent.

alexheretic commented 11 months ago

Yes that's correct. In proto3 primitives (string/ints) default if absent, everything else is nullable. So prost indeed has the correct general approach by wrapping with Option.

However, this frequently results in nested types that are very unergonomic to use in actual applications. To workaround that there can be tons of boilerplate wrapper types with conversions that produce errors for logically required fields. This is also undesirable because of the amount of extra types and conversions that need to be written.

Contrast with json->serde types. It is trivial to require a field in the serde type, even though json itself, of course, does not have a concept of "required". This makes it often straightforward to have a single type parsed from the input payload and validated without wrapping it and without having to deal with lots of nested Options.

So at an app level I'd like to be able to handle protobuf messages in a similar way, enforce my logical requirements with the minimum of type/wrapper complexity.

I think proto3 removing "required" from the protocol was probably correct. But when the message gets to an app we'll always need to validate it and will always have logically required fields. Having to wrangle nested Options is a hard edge for rust, unlike other langs where perhaps a null check can happen after decoding and job done.

soooch commented 9 months ago

We've taken to maintaining a second struct for each Message we decode with prost. Typically the only difference is that it removes most Options. We decode with prost, then immediately parse into our "internal" representation. We would be able to get rid of a lot of boilerplate if prost could just do this as proposed above.

lucasmerlin commented 8 months ago

This was also bugging me, so I tried patching prost-build to remove the options and add the required attribute, like it was shown by @alexheretic, so I could see if my application would still work with these changes and how much code I'd have to update to remove any now obsolete options.

It actually weren't many changes to get it working as expected and after removing all the now obsolete options from my application (and patching pbjson) it is still working!

So I think adding this as a feature or a config to prost build should be relatively trivially possible.

I did run into a problem with recursive messages though, where constructing these now caused a stackoverflow. This could probably be avoided by just having the nested struct in an option if it's recursive (and there is already a check for that because recursive structs are automatically boxed, so that should be pretty easy).

I've pushed my changes here if someone wants to try this for themselves: https://github.com/lucasmerlin/prost/tree/required_message_fields

Disclaimer: I don't know much about protobuf / prost internals and what I've done here might fail in horrible ways, so I don't recommend using it in it's current state.

If the prost maintainers are open for such a change I'd be happy to update my code to wrap this in a feature / config and open a PR.