stepancheg / rust-protobuf

Rust implementation of Google protocol buffers
MIT License
2.81k stars 381 forks source link

Decode unknown enum values #233

Closed stepancheg closed 5 years ago

stepancheg commented 7 years ago

Currently rust-protobuf results in error when decoding unknown enum values.

Protobuf 3 requires that message should preserve unknown enum values, not simply drop them.

Proposal

struct<E : ProtobufEnum> ProtobufEnumOrUknown<E> { ... }

will be introduced in the protobuf library.

Generated fields will be of that type:

color: ProtobufEnumOrUnknown<Color>; // for protobuf3 signular enums
color: Option<ProtobufEnumOrUnknown<Color>>; // for protobuf2 singular enums
colors: Vec<ProtobufEnumOrUnknown<Color>>; // for repeated enums

Getters and setters will work with enums as previously (e. g. Color).

sashahilton00 commented 6 years ago

Is there an easy solution/workaround to avoid this behaviour? Even just ignore the unknown enums? We just experienced this error recently in https://github.com/librespot-org/librespot where the enum added was not important but everyone was required to update anyway.

stepancheg commented 6 years ago

As for temporary workaround I'd suggest patching your .proto files before code generation by simply removing unused enum fields (or by replacing them with i32).

I'm currently working on proper fix.

stepancheg commented 6 years ago

Current patch set: https://github.com/stepancheg/rust-protobuf/commits/unknown-enum-value

Comments, suggestions are welcome! Especially since this change will be somewhat backward incompatible.

plietar commented 6 years ago

In protobuf2 an unknown enum value should be ignored (has_foo() returns false) and added to the list of unknown_fields.

From https://developers.google.com/protocol-buffers/docs/proto#updating

In the current Java and C++ implementations, when unrecognized enum values are stripped out, they are stored along with other unknown fields. Note that this can result in strange behavior if this data is serialized and then reparsed by a client that recognizes these values. In the case of optional fields, even if a new value was written after the original message was deserialized, the old value will be still read by clients that recognize it. In the case of repeated fields, the old values will appear after any recognized and newly-added values, which means that order will not be preserved.

I personally find this horrible, and ProtobufEnumOrUnknown sounds like a much better solution, but matching the official implementation's behaviour should be considered too.

stepancheg commented 6 years ago

@plietar thank you very much for pointing to this valuable piece of spec.

I'm not sure what would be better.

Seems like syntax=proto3 ProtobufEnumOrUnknown is a clear winner.

The question what to do with syntax=proto2 on unknown values. There are options:

So we need to choose between options (3) and (4).

Anyway, ProtobufEnumOrUnknown should go to version 1.5 of rust-protobuf, and for 1.4 branch I think I should implement storing unknown values in unknown fields for both proto2 and proto3.

stepancheg commented 6 years ago

I've created a patch which implement storing unknown values in unknown fields: #276

This is source-compatible change, but behavior is changed. I want to merge it into 1.4 branch and create a new version.

This is somewhat dangerous change, so I'd appreciate code review.

stepancheg commented 6 years ago

Merged that diff and released as 1.5.0.

Keeping the issue open to implement ProtobufEnumOrUnknown.

hicqu commented 6 years ago

Hi, Any progress for implement ProtobufEnumOrUnknown? We need the feature to avoid check unknown_fields every time.

hicqu commented 6 years ago

Could ProtobufEnumOrUnknown be used for both proto 3 and proto 2? personally I think it's better than store it in unknown fields, even if it's not compatible with offical implement.

stepancheg commented 6 years ago

@hicqu could you describe your use case? Why do you check for unknown enum values in unknown fields? I mean, unset enum and unknown enum are usually the same for the most applications, and storing enums in unknown fields is usually needed only to a) not failing during parsing b) preserving during decoding/encoding.

I'm asking to better understand what people need to make a proper decision.

stepancheg commented 6 years ago

@hicqu I also think that storing enums in ProtobufEnumOrUnknown is OK, and compatibility with official implementation is not a big issue. My largest concern is ergonomics: users will need to explicit unwrap it instead of using it directly.

hicqu commented 6 years ago

My use case is about compatibility. Suppose we are rolling update a distribution system, in which nodes are sending messages with old enum enum E { A = 0, B = 1 }. After rolling update the enum changes to enum E { A = 0, B = 1, C = 2}, so if nodes after the rolling update send messages with E::C to other nodes, they will get E::A instead and store 2 into unknown fields.

It won't be a problem if E::A means Invalid. However many Enums we used have a valid default value. So our code must change from

if enum_from_message == EnumType::DefaultValue {
    // do special logic.
}

to

if enum_from_message == EnumType::DefaultValue &&
    message.get_unknown_fields().get(proto_enum_field_id).is_none() {
    // do special logic.
}

It's hard to check all places have used Enums. So ProtobufEnumOrUnknown is very helpful for us.

BusyJay commented 6 years ago

How about represent the enum as i32? And generated code looks like following:

#[repr(i32)]
pub enum EnumA {
    Value0 = 0,
    Value1 = 1,
    Value2 = 2,
}

struct B {
    a: i32,
}

impl B {
    pub fn set_a(&mut self, a: EnumA) {
        self.a = a as i32;
    }

   pub fn set_a_value(&mut self, a: i32) {
        self.a = a;
   }

    pub fn get_a(&self) -> EnumA {
        match self.a {
            0 => EnumA::Value0,
            1 => EnumA::Value1,
            2 => EnumA::Value2,
            _ => EnumA::Value0,
        }
    }

    pub fn get_a_value(&self) -> i32 {
        self.a
    }
}

This way we can keep similar behavior as other languages' implementations and also follow proto3's open enum semantics while keeping the interface clean and neat.

stepancheg commented 5 years ago

Protobuf in master now generates enum fields as ProtobufEnumOrUnknown<E>.

Example

It's not late to revert this change.