google / protobuf.dart

Runtime library for Dart protobufs
https://pub.dev/packages/protobuf
BSD 3-Clause "New" or "Revised" License
533 stars 184 forks source link

Null message fields should not be instantiated #309

Open darioielardi opened 5 years ago

darioielardi commented 5 years ago

Given, for example, the following proto definitions:

message Address {
  string street = 1;
}

message User {
  Address address = 1;
}

the current behavior, when decoding an incoming User message, is to initialize the Address generated class even if the field is unset, filling it with empty ( zero ) values, which should not happen.

This is crucial for a variety of situations, for example using value wrappers, which are Protobuf Well-Known Types and represents the Google solutions to distinct a scalar field zero value from its default value. This is actually not possible with the current dart implementation, because the wrapper generated class is instantiated anyway and the inner value becomes a zero value.

Also, I think this is what was referred in #245.

CupNoodleFork commented 4 years ago

same issue

sigurdm commented 4 years ago

You can use the hasFoo getters to know if a field is set or not.

I agree that the pre-initialized default values are kind of strange. I do not know the historic origins of those. I'm just afraid it is too late to change this behavior without breaking too much code.

yanivshaked commented 4 years ago

Hi, any update on this issue? This may cause huge amount of memory usage - which is not used in any case

sigurdm commented 4 years ago

To clarify a bit. The fields are not initialized if they are not present.

Instead you get a default value (created lazily) when you get an unset field.

yanivshaked commented 4 years ago

Thank you for your quick response!

Using the information you provided, I have changed :

  dynamic _$getND(int index) {
    var value = _values[index];
    if (value != null) return value;
    return _getDefault(_nonExtensionInfoByIndex(index));
  }

to:

  dynamic _$getND(int index) {
    var value = _values[index];
    return value;
  }

This solves the above issue. Instead of generating a ghost object on get, you return null.

Do you see any downsides for the above suggested change?

yanivshaked commented 4 years ago

@sigurdm Appreciate your option on the above suggestion

osa1 commented 2 years ago

I think the right thing to do here is to use has... methods before reading the fields. In the original example, you should be using hasAddress before calling the address getter.

The proto3 spec says this about the default values of message fields:

For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for details.

So I think it's not necessarily a bug in the Dart implementation that a message field is initialized with some default when it's missing in an encoding as long as you can check if it was provided in the message. Here's a test using the proto definition in the bug report:

void main() {
  // Empty message
  {
    User user = User.fromBuffer(<int>[]);
    print(user.hasAddress()); // false
  }

  // address: {}
  {
    User user = User.fromBuffer(<int>[0x0a, 0x00]);
    print(user.hasAddress()); // true
  }
}

I will have to think more about @yanivshaked's suggestion, but I think it won't work. The problem is _$getND is used in files generated for proto2 message as well, but the suggested change only makes sense in proto3 message fields. So it will break code generated for proto2 messages.

In any case, we probably won't be able to change the current getters for message fields as it will break a lot of code. We could however provide an alternative that returns null for unset message fields in proto3 messages. (I will have to check what the ideal behavior here should be in proto2)

rakudrama commented 2 years ago

I think returning null plays very nicely with Dart's many facilities for working with nullable types (?., ??, promotion on testing etc), but this does require rethinking of the whole API. It is not something do be done without a comprehensive design that is subject to review. Fiddling with some methods to get an effect is not the way forward.

Regarding _$getND, it is an internal implementation detail. There is no reason that proto2 and proto3 need to use the same method, or that stored field values are accessed via an index, or that null internally indicates a missing value. These are all implementation details that can change while providing the same API.

osa1 commented 2 years ago

I just noticed this part in the proto3 spec default values section, right after the part I quoted in my previous comment:

Note that for scalar message fields, once a message is parsed there's no way of telling whether a field was explicitly set to the default value (for example whether a boolean was set to false) or just not set at all: you should bear this in mind when defining your message types

It seems like the spec assumes that existence of a field is indicated in the value of the field in the deserialized message and there are no hasX methods like the ones we provide in protobuf.dart. With hasX methods it's possible to tell whether a field was explicitly set or not, even when the field type is a scalar.

So when the spec says "For message fields, the field is not set. Its exact value is language-dependent" it doesn't mean (though they're not explicit about this) that we can use an arbitrary value for the message as the default, rather it means it should be indicated as null (e.g. in Java, Dart, ...), or Option/Maybe in languages with sum types and no null (e.g. Rust, OCaml, Haskell, ...), or nil in languages like Scheme, Lua, etc. the exact value is language dependent, but it's not a valid message value.

I still think what we have in protobuf.dart library should be good enough and this issue should not block anyone (just use has... method for your field), but if my reading of the spec, as described above, is correct, then returning null for missing fields with a message type should be more consistent with protobuf implementations in other languages and is more in line with what the spec says.

Maybe something to consider for the next major version bump.

osa1 commented 2 years ago

Just came across this design document about field presence in proto3 which is saying the same thing I wrote in my previous comment above:

When serializing, fields with no presence are not serialized if they contain their default value.

  • ...
  • For messages, the default is the language-specific null value.
osa1 commented 2 years ago

628 is another manifestation of this issue.

Update: b/217596113 reports the same issue.

osa1 commented 2 years ago

I checked how protobuf implementation in other langauges do this.

Summary:

Kotlin is added to both because it provides two different getter: one that returns empty message, one that returns null.

C++

Returns a read-only empty message.

https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#embeddedmessage

const Bar& foo() const: Returns the current value of the field. If the field is not set, returns a Bar with none of its fields set (possibly Bar::default_instance()).

Go

Returns nil for unset message fields, for both required and optional.

https://developers.google.com/protocol-buffers/docs/reference/go-generated

// The generated code is the same result if required instead of optional ... Message fields can be set to nil, which means that the field is unset, effectively clearing the field. This is not equivalent to setting the value to an "empty" instance of the message struct.

Java

Returns an empty message.

https://developers.google.com/protocol-buffers/docs/reference/java-generated

If the field is not set, getFoo() will return a Foo instance with none of its fields set (possibly the instance returned by Foo.getDefaultInstance()).

Kotlin

Returns empty message, but also provides an accessor that returns null if the field is not present.

https://developers.google.com/protocol-buffers/docs/reference/kotlin-generated

If the field is not set, returns the default value ... Note that there is no special handling of submessages ... We also provide an extra field accessor FooOrNull for all optional message fields that returns the value of the field if it is present and null otherwise.

Python

As far as I can see it's not documented.

However there's something interesting in Python implementation: if a message field is not set, you can still set a field of the field (i.e. msg.nestedMsg.field = 1 when nestedMsg is not set), and it automatically sets the message field:

https://developers.google.com/protocol-buffers/docs/reference/python-generated#embedded_message

Message types work slightly differently. You cannot assign a value to an embedded message field. Instead, assigning a value to any field within the child message implies setting the message field in the parent

I considered a similar semantics for Dart implementation, but I think it's a bad idea. Consider:

Msg myMsg = Msg.create();
NestedMsg nestedMsg = myMsg.nestedMsg;
...
myMsg.nestedMsg.field = 123;

With the semantics described in Python API (quoted above), after the last line above, will nestedMsg.field return 123, or 0? It depends:

It's not difficult to see how much headache this kind of semantics will cause if unset message fields are not None.

Even when unset message fields are None it seems like a bit too much magic to me.

Ruby

Returns nil.

https://developers.google.com/protocol-buffers/docs/reference/ruby-generated#embedded_message

For submessages, unset fields will return nil, so you can always tell if the message was explicitly set or not

C

Unclear. It says you can assign null to clear a message field, but doesn't say what happens if you read such a field.

https://developers.google.com/protocol-buffers/docs/reference/csharp-generated#singular

Message fields can be set to null values, which is effectively clearing the field. This is not equivalent to setting the value to an "empty" instance of the message type

Objective-C

Returns empty message

https://developers.google.com/protocol-buffers/docs/reference/objective-c-generated#singular3

The default "empty" value for a message is an instance of the default message. To clear a message value it should be set to nil. Accessing a cleared message will return an instance of the default message and the hasFoo method will return false.

PHP

Returns nil

https://developers.google.com/protocol-buffers/docs/reference/php-generated#embedded_message

A field with a message type defaults to nil, and is not automatically created when the field is accessed

JohnGalt1717 commented 2 years ago

C# they're refusing to implement nullable types because they say that it would cause languages that don't have nullable types to have different behavior.

optional should be nullable. They should be one to one correspondance. This is breaking languages with nullable types and hobbling their compile time safety of protobuf derived classes and null is absolutely a valid value for all fields because it literally means "no entry" which is entirely valid and optional defines if that is a possible case.

Everyone working on protobuf needs to get together and agree: optional = nullable and get all implementations in line and consistent ASAP.

talksik commented 1 year ago

agreed, this is weird. switched from json to protobufs and now dealing with this problem.

you could also add a method called GetUser() or GetX() just like protobuf does for golang. This should return null or object/primitive