unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
229 stars 33 forks source link

We need a list of possible error codes #706

Closed mihnita closed 2 months ago

mihnita commented 7 months ago

I thought we have one, but apparently not. I grepped for errors in the test/ folder (grep -i -r error ./test/) and what these are the values we have there: "bad-input", "bad-option", "unresolved-var", "not-formattable"

./test/README.md mentions "Syntax Error" and "Data Model Error".

So I looked for those values everywhere else (from the root of the repo):

grep -r -i 'bad\S*input' .
grep -r -i 'bad\S*option' .
grep -r -i 'unresolved\S*var' .
grep -r -i 'not\S*formattable' .

The only instances are the ones in the test files.

Should we create error codes (to return in the implementation) from the headings in ./spec/errors.md ?

For example ### Duplicate Declaration => duplicate-declaration, ## Formatting Errors => formatting-error?

aphillips commented 7 months ago

Isn't this more of an implementation detail?

The types of errors are somewhat general and we don't say exactly what they entail. Partly this is because different programming languages have widely varying needs. You may be right that having a nice succinct token is easier to deal with than our prose-y names. But equally an implementation might assign numbers or assign names IN_VARIOUS-distinct_and_locally_Idiosyncratic_Forms 😄


I think that your implementation experience will be valuable here. Are the categories we have reasonable when implementing? When there are multiple error types that could be thrown for a given error, is what the spec says what you find most usable? Is there an error you encounter that we don't have (as I found a couple weeks ago)? I have questions about error handling that implementation experience should inform.

For example, resolve selectors says to emit a Selection Error if the resolved value of a selector is not "selectable". One reason that a selector would not be selectable is that it is missing its annotation, but Missing Selector Annotation is (correctly, I think?) a data model error, not a selection error.

macchiati commented 7 months ago

An example of that would be

.match{$count :formattingOnlyFunction}

I'm not sure whether it also subsumes

.match{$count :number} Fred {{... some message }}

On Wed, Mar 6, 2024 at 8:58 AM Addison Phillips @.***> wrote:

Isn't this more of an implementation detail?

The types of errors are somewhat general and we don't say exactly what they entail. Partly this is because different programming languages have widely varying needs. You may be right that having a nice succinct token is easier to deal with than our prose-y names. But equally an implementation might assign numbers or assign names IN_VARIOUS-distinct_and_locally_Idiosyncratic_Forms 😄

I think that your implementation experience will be valuable here. Are the categories we have reasonable when implementing? When there are multiple error types that could be thrown for a given error, is what the spec says what you find most usable? Is there an error you encounter that we don't have (as I found a couple weeks ago)? I have questions https://github.com/unicode-org/message-format-wg/pull/704#issuecomment-1981118481 about error handling that implementation experience should inform.

For example, resolve selectors https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#resolve-selectors says to emit a Selection Error if the resolved value of a selector is not "selectable". One reason that a selector would not be selectable is that it is missing its annotation, but Missing Selector Annotation is (correctly, I think?) a data model error, not a selection error.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/706#issuecomment-1981353643, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMENR7MGEXH5GGETFFLYW5DKTAVCNFSM6AAAAABEJLQWEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGM2TGNRUGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

aphillips commented 7 months ago

@macchiati

The first one would be a Selection Error. A concrete example in LDML45 would be .match {$count :datetime} because :datetime is not a valid selector.

The second one is a Selection Error also, because :number says it is in registry.md. (Note that a different function might just ignore an unsupported value or rank it lower). It's not clear whether we should have sub-species of Selection Error yet. None of these is a Data Model Error because those are evaluated without the need for the registry.

Here are the DM errors:

.match {$count}
* {{... is a Missing Selector Annotation error}}

.match{$count :number}
* * {{... is a Variant Key Mismatch error (should only have one key)}}

.match{$count :number}
1 {{... is fine as a variant, but this message produces a Missing Fallback Variant because it has no *}}

.match {$count :number minimumFractionDigits=2 minimumFractionDigits=3}
* {{... produces a Duplicate Option Name error, but may execute if the impl wants to continue}}

.input {$var :number}
.local $var = {$foo :number}
{{... produces a Duplicate Declaration error.}}

I think (but the spec is silent about it) that an implementation could produce different error types that group as "Selection Errors". The key thing about a Selection Error is that it means that a pattern cannot be selected. This means that we emit the fallback pattern (typically the logo, aka {�}).

macchiati commented 7 months ago

I think the lines between some of these categories will vary depending on the implementation. Some implementations may want to precompile messages, and thereby handle all the possible errors that can be detected without access to a particular set of input parameters, while other implementations may do everything at runtime. The current classification of errors as being Data Model Errors is really based on having a certain implementation model in mind (and not using BNF syntax like [ wfc: ... ] from EBNF for well-formedness constraints that can't be expressed easily with ABNF.)

On Wed, Mar 6, 2024 at 12:01 PM Addison Phillips @.***> wrote:

@macchiati https://github.com/macchiati

The first one would be a Selection Error. A concrete example in LDML45 would be .match {$count :datetime} because :datetime is not a valid selector.

The second one is a Selection Error also, because :number says it is in registry.md. (Note that a different function might just ignore an unsupported value or rank it lower). It's not clear whether we should have sub-species of Selection Error yet. None of these is a Data Model Error because those are evaluated without the need for the registry.

Here are the DM errors:

.match {$count}

  • {{... is a Missing Selector Annotation error}}

.match{$count :number}

    • {{... is a Variant Key Mismatch error (should only have one key)}}

.match{$count :number} 1 {{... is fine as a variant, but this message produces a Missing Fallback Variant because it has no *}}

.match {$count :number minimumFractionDigits=2 minimumFractionDigits=3}

  • {{... produces a Duplicate Option Name error, but may execute if the impl wants to continue}}

.input {$var :number} .local $var = {$foo :number} {{... produces a Duplicate Declaration error.}}

I think (but the spec is silent about it) that an implementation could produce different error types that group as "Selection Errors". The key thing about a Selection Error is that it means that a pattern cannot be selected. This means that we emit the fallback pattern (typically the logo, aka {�}).

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/706#issuecomment-1981686600, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMC44FTNAS6A55BNBZ3YW5Y3NAVCNFSM6AAAAABEJLQWEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGY4DMNRQGA . You are receiving this because you were mentioned.Message ID: @.***>

eemeli commented 7 months ago

Isn't this more of an implementation detail?

This has been discussed in a few of our recent meetings, and I agree with @mihnita that common error codes would be useful to implementers and users. The ones currently in the test suite come from the JS implementation, where e.g. the :number implementation emits bad-input here, and bad-option here.

Each of those is a Resolution Error, but they're not currently explicitly included in errors.md. OTOH unresolved-var corresponds to Unresolved Variable, and not-formattable is a generic Formatting Error.

I think (but the spec is silent about it) that an implementation could produce different error types that group as "Selection Errors". The key thing about a Selection Error is that it means that a pattern cannot be selected. This means that we emit the fallback pattern (typically the logo, aka {�}).

Yes, an implementation or a custom function could emit other Selection Errors, but no, that would not result in formatting to a {�}. Instead, the algorithm selects the catch-all pattern.

This persisting mis-reading of our pattern selection logic suggests to me that we might want to reconsider first-choice selection during the tech preview. Its algorithmic spec language would be significantly simpler and easier to understand.

catamorphism commented 7 months ago

Isn't this more of an implementation detail?

I don't think so -- the tests are part of the spec and the expected error names are part of the tests.

macchiati commented 7 months ago

This persisting mis-reading of our pattern selection logic suggests to me that we might want to reconsider firstchoice selection during the tech preview. Its algorithmic spec language would be significantly simpler and easier to understand.

I think the statement of the algorithm can be simplified considerably (and make it clear which operations functions need to support in order to do the matching). I posted in https://github.com/unicode-org/message-format-wg/issues/715

On Wed, Mar 6, 2024 at 9:27 PM Eemeli Aro @.***> wrote:

Isn't this more of an implementation detail?

This has been discussed in a few of our recent meetings, and I agree with @mihnita https://github.com/mihnita that common error codes would be useful to implementers and users. The ones currently in the test suite come from the JS implementation, where e.g. the :number implementation emits bad-input here https://github.com/messageformat/messageformat/blob/d188fc449508cb2267f45a90361e15bb296a3ce2/packages/mf2-messageformat/src/functions/number.ts#L84, and bad-option here https://github.com/messageformat/messageformat/blob/d188fc449508cb2267f45a90361e15bb296a3ce2/packages/mf2-messageformat/src/functions/number.ts#L113 .

Each of those is a Resolution Error, but they're not currently explicitly included in errors.md. OTOH unresolved-var corresponds to Unresolved Variable https://github.com/unicode-org/message-format-wg/blob/main/spec/errors.md#unresolved-variable, and not-formattable is a generic Formatting Error https://github.com/unicode-org/message-format-wg/blob/main/spec/errors.md#formatting-errors .

I think (but the spec is silent about it) that an implementation could produce different error types that group as "Selection Errors". The key thing about a Selection Error is that it means that a pattern cannot be selected. This means that we emit the fallback pattern (typically the logo, aka {�}).

Yes, an implementation or a custom function could emit other Selection Errors, but no, that would not result in formatting to a {�}. Instead, the algorithm selects the catch-all pattern.

This persisting mis-reading of our pattern selection logic suggests to me that we might want to reconsider first-choice selection during the tech preview. Its algorithmic spec language would be significantly simpler and easier to understand.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/706#issuecomment-1982382905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMD4GUQM65OYH6OJBH3YW73DPAVCNFSM6AAAAABEJLQWEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBSGM4DEOJQGU . You are receiving this because you were mentioned.Message ID: @.***>

mihnita commented 7 months ago

we might want to reconsider firstchoice selection during the tech preview.

I am completely against making any such change only days before we have to submit the changes to ICU.

And this is something we discussed, more than once, we voted on it. Changing it to first match is also incompatible with MF1.


That has also nothing to do with the error codes, discussed here.

Some of the errors are obvious, have nothing to do with the selection algorithm, or the specific selector functions:

Three selectors, one key:

.match {$foo} {$baz} {$bar}
   red {{ ... }
   * {{ ... }}

Double key cases:

.match {$foo}
   red {{ ... }
   red {{ ... }
   * {{ ... }}

The exact errors might be implementation details. And some implementations might report the same errors, but at different stages (parse vs formatting)

But if we say they are implementation details, then we should not have them in the spec (/spec/errors.md)

And should not have them in the json tests suites.

Make the whole document FYI only. Without ever saying "MUST emit" in that doc :-)

macchiati commented 7 months ago

Just to be clear, my comments were for post v45 discussion

On Sat, Mar 9, 2024, 09:42 Mihai Nita @.***> wrote:

we might want to reconsider firstchoice selection during the tech preview.

I am completely against making any such change only days before we have to submit the changes to ICU.

And this is something we discussed, more than once, we voted on it. Changing it to first match is also incompatible with MF1.

That has also nothing to do with the error codes, discussed here.

Some of the errors are obvious, have nothing to do with the selection algorithm, or the specific selector functions:

Three selectors, one key:

.match {$foo} {$baz} {$bar} red {{ ... }

  • {{ ... }}

Double key cases:

.match {$foo} red {{ ... } red {{ ... }

  • {{ ... }}

The exact errors might be implementation details. And some implementations might report the same errors, but at different stages (parse vs formatting)

But if we say they are implementation details, then we should not have them in the spec (/spec/errors.md https://github.com/unicode-org/message-format-wg/blob/main/spec/errors.md )

And should not have them in the json tests suites.

Make the whole document FYI only. Without ever saying "MUST emit" in that doc :-)

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/706#issuecomment-1986927662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCHJ6WIIRI2VL7HDP3YXNCX5AVCNFSM6AAAAABEJLQWEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWHEZDONRWGI . You are receiving this because you were mentioned.Message ID: @.***>

catamorphism commented 2 months ago

The test schema contains a list of error codes, so this issue can be closed, right?

eemeli commented 2 months ago

During the 2024-07-22 call we identified a need to document in test/README.md the relationship between the error codes we use in tests, and the errors enumerated in spec/errors.md. The PR for that ought to close this issue. I don't think anyone specifically volunteered to add that markdown table.

catamorphism commented 2 months ago

I don't think anyone specifically volunteered to add that markdown table.

Submitted https://github.com/unicode-org/message-format-wg/pull/850 to add the table :)