unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
229 stars 33 forks source link

[FEEDBACK] Possible simplification of the data model #786

Open tomasr8 opened 4 months ago

tomasr8 commented 4 months ago

I am currently working on implementing the mf2 spec in Python and I'm trying to understand why FunctionExpression and UnsupportedExpression are two separate entities in the data model. Could we simply combine them similar to LiteralExpression and VariableExpression?

This is what I mean:

interface FunctionExpression {
  type: "expression";
  arg?: never;
- annotation: FunctionAnnotation;
+ annotation: FunctionAnnotation | UnsupportedAnnotation;
  attributes: Attribute[];
}

- interface UnsupportedExpression {
-   type: "expression";
-   arg?: never;
-   annotation: UnsupportedAnnotation;
-   attributes: Attribute[];
- }

This should not introduce any ambiguity because you can always tell which expression you're working with based on the annotation type.

alerque commented 4 months ago

No, implementing this in type safe languages would be a nightmare if the same types needed to be able to hold valid usable data and and invalid/partially invalid data. It makes the outer type nearly useless and you have to implement a ton of code just to figure out what you have and handle different cases vs. just having the right type up front.

tomasr8 commented 4 months ago

Agreed, but from my understanding that is currently the case with VariableExpression and LiteralExpression anyway. You need to inspect the annotation to know if you're dealing with an unsupported expression. Only with FunctionExpression can you tell immediately by the type itself.

alerque commented 4 months ago

Good catch. In which case I'd argue those should be reworked to match this one, not the other way around. :wink:

tomasr8 commented 4 months ago

Good catch. In which case I'd argue those should be reworked to match this one, not the other way around. 😉

Indeed, that would be better :) In that case, I'd propose something like this:

interface LiteralExpression {
  type: "expression";
  arg: Literal;
- annotation?: FunctionAnnotation | UnsupportedAnnotation;
+ annotation?: FunctionAnnotation;
  attributes: Attribute[];
}

interface VariableExpression {
  type: "expression";
  arg: VariableRef;
- annotation?: FunctionAnnotation | UnsupportedAnnotation;
+ annotation?: FunctionAnnotation;
  attributes: Attribute[];
}

interface FunctionExpression {
  type: "expression";
  arg?: never;
  annotation: FunctionAnnotation;
  attributes: Attribute[];
}

interface UnsupportedExpression {
  type: "expression";
- arg?: never;
+ arg?: Literal | VariableRef;
  annotation: UnsupportedAnnotation;
  attributes: Attribute[];
}
eemeli commented 4 months ago

I am currently working on implementing the mf2 spec in Python and I'm trying to understand why FunctionExpression and UnsupportedExpression are two separate entities in the data model. Could we simply combine them similar to LiteralExpression and VariableExpression?

Sure, we could, but that wouldn't really change anything. Note that these are TypeScript interface definitions, so a value matching the current definition would also match your proposed alternative.

There are two main reasons why Expression is split up the way it is:

  1. We need a VariableExpression definition, because it's used in InputDeclaration.
  2. We want to represent the requirement of having at least arg or annotation be non-empty in the data model.

So when implementing your internal data model, you may want to see if you can drop those requirements, which allows for a single, simpler Expression:

interface Expression {
  type: "expression";
  arg?: Literal | VariableRef;
  annotation?: FunctionAnnotation | UnsupportedAnnotation;
  attributes: Attribute[];
}

To use that, you'll need to separately verify that either arg or annotation is present, and you'll need to collapse the declarations into a single definition:

interface Declaration {
  name: string;
  value: Expression;
}

This is all possible because the TS representation of the data model is not really intended to support interchange between systems; that's what the JSON Schema and DTD definitions are for.

For an example Python datamodel that applies the above simplifications, see message.py in the moz.l10n package that I'm currently working on.