google / emboss

Emboss is a tool for generating code that reads and writes binary data structures.
Apache License 2.0
70 stars 21 forks source link

Support default values #21

Open fsareshwala opened 2 years ago

fsareshwala commented 2 years ago

Reading an unset field through the Emboss view API results in an assertion error on non-debug builds. This is because we are reading uninitialized memory. It would be great if Emboss could support default values for certain fields.

Use Case Consider packets which have opcodes in their headers. These opcode values are unique to the packet/message being constructed and never change. They need to be defined in the packet but having the developer fill them out every time is redundant and error prone. Instead, it would be better to define the value in the .emb file and have Emboss take care of it for all uses.

Potential Implementation Specify a [default: 0] attribute in the .emb files. Fields are initialized to their defaults via an initialize() method on views.

BenjaminLawson commented 1 year ago

Should Ok() also fail if the value of the field is not the default/fixed value? In the case of commands with op codes, the view should be invalid if it is not the specified value.

reventlov commented 1 year ago

Ok() falls under the (existing) feature, [requires].

However, ideally, [default_value] (or whatever it gets called) would be inferred from annotations like [requires: this == 10], without a human having to explicitly add it.

fsareshwala commented 1 year ago

Just to make sure, I think we can infer the default value from the requires annotation if it's there and only a scalar value. AFAIK, the requires annotation allows for ranges as well and we can't infer much there. We also need to be able to set a default value which can be overridden to something else if necessary. Just want to make sure the implementation captures all these use cases.

reventlov commented 1 year ago

That is correct. The exact rules I am thinking are:

  1. If the field is a structure (struct or bits type), its default value is the compound of the default values of its fields.
  2. Otherwise, if there is a [default_value: n] annotation on the field, the default value is n.
  3. Otherwise, if there is a [requires] annotation of the form [requires: this == n] or [requires: n == this], where n is a compile-time constant (including an expression that evaluates to a compile-time constant, like 2 * Type.some_constant), then the default value is n.
  4. Otherwise, if the field is a boolean and there is a [requires] annotation of the form [requires: this], the default value is true.
  5. Otherwise, use the default default value for the field's type, which should be 0 for numeric types and false for booleans.

I am open to other inference forms, but I cannot think of any that would be useful and seem likely in real code.

BenjaminLawson commented 1 year ago

Android's PDL uses the following syntax:

  _fixed_ = 0x12 : 8, // fixed field 0x12 that takes 8 bits

Which translates to this in Emboss:

0 [+1] UInt opcode
  [fixed: 0x12]

This could implicitly add a requires attribute.

In C++, we could either initialize all of the fixed values in a view or individual fields:

auto view = MakeFooView();
view.Initialize();
view.opcode().Write(); // No parameter, as the fixed value is already known
view.opcode().WriteFixed(); // alternative
view.opcode().Initialize(); // alternative
view.opcode().UncheckedWrite(0x13); // bypass fixed value & requires (e.g. for testing)

We may want view.opcode().Write() in addition to view.Initialize(), and we could implement Write() first, which would meet our project's requirements.

fsareshwala commented 1 year ago

Agree with all that has been said above already. However, could we use default instead of fixed? This aligns more closely with protobuf (a more commonly used IDL) and also implies that the value can be changed. Using fixed sort of implies that that value is all it can be which is already covered by the requires moniker.

Personally, I like having the API look like this:

auto view = MakeFooView();

// initializes the view to use all fixed values (this allows us to continue to make a view over
// input data via the constructor without overwriting it with default values. Initialize explicitly
// implies a write.
view.Initialize();

view.opcode().Write(0x13); // Overwrite default value, requires checking still performed
view.opcode().UncheckedWrite(0x13); // Overwrite default value, requires checking not performed (e.g. for testing)
reventlov commented 1 year ago

I would prefer something like [initialize_to: ...]. To me, [default: ...] implies that the default value will be read if there is 'no' value at that location, where [initialize_to: ...] is a bit more explicit that it will be used by Initialize(). (Though still maybe confusing in that someone might expect automatic intialization when a view is created on a 'fresh' buffer.)

fsareshwala commented 1 year ago

I might be confused here so please correct me if I'm wrong.

In our experience, reading a value only occurs on data we obtain elsewhere and that we create a view on top of. In that context, how do we know if there is 'no' value at the location or if that value is what it is supposed to be? There is no Emboss specific wire format to indicate whether the field is uninitialized or not. As such, it doesn't make sense to me what it means to have a default value read out of the view. I think default only applies to writes.

reventlov commented 1 year ago

Emboss has two notions of an "unavailable" field: the underlying buffer being too short to hold the field, and an if clause that evaluates to false. I can pretty easily imagine a user, especially a newer user of Emboss, assuming that [default] would apply in either of those cases.

I can also pretty easily imagine a user assuming that Emboss does know when its backing storage is uninitialized, especially if they aren't an experienced C or C++ programmer.

Also:

jasongraffius commented 1 month ago

I'm personally in favor of [initializer: ...] for setting a default initialization value, especially for the first two reasons @reventlov points out above.

Additionally, if we also want a separate tag that implies both [initializer: ...] and [requires: this == ...] then: