stephenh / ts-proto

An idiomatic protobuf generator for TypeScript
Apache License 2.0
2.12k stars 345 forks source link

Nullable + Well-Known Types for Partial/Unset-Style APIs #554

Closed NAlexandrov closed 2 years ago

NAlexandrov commented 2 years ago

Is it possible to change type generation and change behavior for Well-Known Types from

stringValue?: string | undefined;

to

stringValue?: string | undefined | null;

for proto

optional google.protobuf.StringValue string_value = 1;

In our case (for update requests) values:

stephenh commented 2 years ago

@NAlexandrov well, ts-proto's mission statement is, for better or worse, "anything is possible if you add another flag!" :-)

But, more seriously, yeah, that seems pretty reasonable: so null would be a StringValue on the wire but w/o the string_value set to any value? Today I think we just assume that should be undefined.

I mentioned the flag, but dunno, I almost wonder if we should just always do this, b/c it both seems worth differentiating just all the time, and also for simplicity to avoid yet-another-flag complexity in the codebase.

@boukeversteegh wdyt? Could we do this just all the time/without a flag?

boukeversteegh commented 2 years ago

Hm, I understand the use-case, and I think it's a fairly common pattern in API design. However, protobuf does not make this type of pattern easy.

The reason being that, if a is a StringValue field, then any non-empty string will always come out either as "" or null:

typescript → json or protobuf encoded → server decoded (e.g. c#)
{} {} {a: null}
{a: null} {} {a: null}
{a: undefined} {} {a: null}
Playing with the raw encoding of StringValue also doesn't help:
{a: {}} {a: ""}
{a: {value: null}} {a: ""}
{a: {value: undefined}} {a: ""}

So your best option with protobuf would be to set the following convention:

But this way, you again lose distinction between "" and unset. This is because:

I think it's not possible with Protobuf.

An alternative approach could be to use fieldMasks. Meaning, to pass along which fields you want to update clear, and check it server-side.


message User {
   google.protobuf.StringValue name;
   google.protobuf.FieldMask clearFields;
}

One more alternative is to simply pass all fields with their existing values, but I understand this may not be a viable option if the objects are large.

NAlexandrov commented 2 years ago

Thanks for the detailed answer. For some reason, I thought that StringValue is used to pass nulls, but it doesn't work that way. Now we use the next alternative to distinguish between an empty string and a null value:

// NullableString
message NullableString {
  // is_null is flag that defines a null value
  bool is_null = 1;

  // value
  string value = 2;
}

message SomeObject {
  NullableString some_nullable_property = 1;
}
boukeversteegh commented 2 years ago

Thanks for the detailed answer. For some reason, I thought that StringValue is used to pass nulls, but it doesn't work that way.

You're welcome. Actually, your understanding is correct, you can see in the table above that omitting the value will result in null, server side. But you need an additional state, which is ignore.

If you can give up distinguishing the empty string from unset, meaning that you will basically treat empty strings as missing values, then you could encode as follows.

Now we use the next alternative to distinguish between an empty string and a null value:

// NullableString
message NullableString {
  // is_null is flag that defines a null value
  bool is_null = 1;

  // value
  string value = 2;
}

message SomeObject {
  NullableString some_nullable_property = 1;
}

If the 3 states are very important, something like this could work, but you'd be giving up on smooth conversation to native types, as every string will now be an object. It will affect every access and assignment of the property in both the client code and the server code. It may very well be the right solution for your project, but in general i would be careful with this approach because of its big impact.

stephenh commented 2 years ago

@boukeversteegh am I missing something, or in the first line of your example, instead of a (to make it more concrete), if we're sending:

message Author {
  google.protobuf.StringValue firstName = 1;
}

And in TS we do:

const author: Author = { firstName: undefined }

Wouldn't that mean "there is no StringValue sub-message"?

So instead of the decoded/C# side seeing Author.firstName.value === null, wouldn't they see Author.firstName === null, and hence that would ignore?

I.e. the tri-state is, from the C# side:

Which, I dunno, naively seems like we could present on the TS side by changing Author { firstName: string | undefined to Author { firstName: string | null | undefined and then treat author.firstName = null as "send StringValue(value: null)" (unset). I believe we already treat author.firstName = undefined as "don't send StringValue at all" (ignore).

I've not actually done this in protobuf before, so not 100% sure.

boukeversteegh commented 2 years ago

I.e. the tri-state is, from the C# side:

* Author.firstName = null --> ignore
* Author.firstName.value = null (or "" default value) --> unset the field
* Author.firstName.value = "bob" --> set the field

Yes, conceptually it makes sense what you're saying, but there are 2 reasons that make this impossible:

  1. C# will convert the StringValue class instance into a basic C# string, as it is a 'well-known type' that is supposed to be mapped to native types. That means there is no "value" property that can be accessed.
  2. The value property within StringValue is a protobuf string, as can be seen from the protobuf definition:
    message StringValue {
      string value = 1;
    }

    It being a non-message type, means that it can never contain 'Null'. Protobuf requires that missing fields will be assigned their default value, which for string is "" (empty string).

So, the second state Author.firstName.value = null is not possible. You can make 3 states, if you want, as what you say:

By using the "" to mean "clear the field" (set to null in database for example), you can no longer explicitly set a value to an empty string. In many cases this is unimportant, but for someone already using StringValues to be able to encode NULL, it could be.

Coming back to typescript, these three states are already available:

There are only these 3 states that Protobuf can represent, so adding a fourth type, like 'null' can not be encoded any differently in protobuf or json.

If we would somehow try to preserve a 4th state, during the JSON/protobuf conversion process, that would mean adding new semantics to the protobuf json/wire formats. Other libraries wouldn't handle it the way expected.

For more information, see:

https://developers.google.com/protocol-buffers/docs/proto3?hl=en#json

If a value is missing in the JSON-encoded data or if its value is null, it will be interpreted as the appropriate [default value](https://developers.google.com/protocol-buffers/docs/proto3?hl=en#default) when parsed into a protocol buffer. If a field has the default value in the protocol buffer, it will be omitted in the JSON-encoded data by default to save space. An implementation may provide options to emit fields with default values in the JSON-encoded output.

https://developers.google.com/protocol-buffers/docs/proto3?hl=en#default

When a message is parsed, if the encoded message does not contain a particular singular element, the corresponding field in the parsed object is set to the default value for that field. These defaults are type-specific:

  • For strings, the default value is the empty string
  • ...

Note that for scalar message fields, once a message is parsed ❗ there's no way of telling whether a field was explicitly set to the default value (for example whether a boolean was set to false) or just not set at all: you should bear this in mind when defining your message types. For example, don't have a boolean that switches on some behavior when set to false if you don't want that behavior to also happen by default. ❗ Also note that if a scalar message field is set to its default, the value will not be serialized on the wire.

stephenh commented 2 years ago

So, the second state Author.firstName.value = null is not possible

Right, I included a "or default value of empty string" but should have not written that as .value = null in the first place, as yeah I meant like "conceptually empty" i.e. yeah empty string.

C# will convert the StringValue class instance into a basic C# string

Huh. Like a nullable string or just a string? If it's just a string / non-nullable string, that seems like a mistake b/c afaiu the whole point of the StringValue & friends types was to model "the Author message does not have firstName set, not even an empty string".

...but I suppose "included or not included" is still just a binary state, and not a tri-state of "included or not included or unset".

you can no longer explicitly set a value to an empty string

Personally I would be fine with that, i.e. being unable to set a string to "", but obviously that doesn't work at all for numbers types, i.e. surely would need to be able to both set a number to 0 (default value) as well as unset it.

Okay, yeah, I get it now... thanks @boukeversteegh !

@NAlexandrov are you fine if we close this issue? I don't think there is much we can do here. Thanks!

NAlexandrov commented 2 years ago

@stephenh I close this issue. Thank you!

stephenh commented 2 years ago

@NAlexandrov Cool, thanks!

Fwiw, if I was building a sufficiently large system / ecosystem of TypeScript applications (like "a lot", to justify the ROI) that needed the NullableString / NullableNumber / etc. approach, I would probably build bespoke support for that into ts-proto, such that TS clients could get the ergonomics of firstName: null | undefined | "asdf" for unset / ignore / set.

At least, for me, that's why I started ts-proto in the first place, was to have a "shove as many ergonomics (even if they are bespoke-to-us) into the tool/codegen as possible".

Granted, at this point, you might have to do for your own NullableString types within a fork of ts-proto...

Tangentially, I'm really surprised that with all of the GCP-driven adoption of protobuf, that there are not google.protobuf.NullableString shared types yet; if there were, then I think building them directly into ts-proto would make a lot of sense.

boukeversteegh commented 2 years ago

Huh. Like a nullable string or just a string? If it's just a string / non-nullable string, that seems like a mistake b/c afaiu the whole

Yeah, it's a nullable string, but all strings are nullable in C#, so as a consumer of protobuf in C#, you won't know whether the message may contain null or not, unless you look at the protobuf source files. Actually had a pretty bad bug because of this 😅.

boukeversteegh commented 2 years ago

Tangentially, I'm really surprised that with all of the GCP-driven adoption of protobuf, that there are not google.protobuf.NullableString shared types yet; if there were, then I think building them directly into ts-proto would make a lot of sense.

But there are though, that's StringValue. They're nullable. I think we're talking here about detection of field presence, for example with the proto3 optional keyword, see https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md . If it's possible to detect the absence of a field, as opposed to the field not being set, or set to the default, you could ignore it during updates. Maybe this is the way to go?

stephenh commented 2 years ago

But there are though, that's StringValue. They're nullable.

:-D I thought we were leaning towards "StringValue is a before-proto3-added-back-optional way of denoting presence" but that they don't support encoding null as a specific "unset". Like I can't send an amount: IntValue(is_null: true) to denote both "yes the amount field is present" and "I'm asking for it to be unset / set to null".

My assumption/understanding is that we'd just need a net-new type to handle the addition "set vs. unset" on top/instead of either StringValue or the added-back-optional "present or not present".