unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
227 stars 33 forks source link

Discussion issue for #830: balloting of error handling #831

Closed aphillips closed 1 month ago

aphillips commented 1 month ago

Discussion thread for error handling

This issue provides a discussion space for questions or comments on the balloting of 'error handling' currently (2024-07-16 through 2024-07-21) taking place in issue #830.

Useful references:

Some terminology:

Formatting attempt means a call to a message format implementation for a given message with a set of arguments intended for formatting.

Signal an error is a deliberately vague, generic, neutral way of referring to how an implementation registers that an error has occurred during a formatting attempt with the caller. Common signaling mechanisms include throwing exceptions, returning a value that indicates an error, setting an error flag on the formatter object, and many more.

Provide a fallback representation means that there is some way for the caller to obtain a version of the message that is partially formatted according to the rules already provided in Formatting and notably, but not exclusively, here and here

MUST and SHOULD have their normal RFC2119 / BCP14 meaning.

aphillips commented 1 month ago

@macchiati commented:


I can't really answer unless the question is a bit more clear.

  1. "for signaling errors" - If this were "Must provide a mechanism for detecting errors" I would pick a higher number. That is, it could be satisfied by throwing an exception, or by having an additional return parameter, or by providing a separate function to query whether there was an error.

  2. I think the question might depend on the type of errors (This division doesn't align with the typology in the spec, because it is "behavior based" based.)

    1. no matter what the input parameters are — eg syntax errors like{$abc $def}
    2. call-site mismatch errors — eg format(myDateMessage, date="Einstein"), or missing input parameter
    3. others

Definitely for (1) I don't think there has to be a fallback message result

aphillips commented 1 month ago

@macchiati

for signaling errors" - If this were "Must provide a mechanism for detecting errors" I would pick a higher number. That is, it could be satisfied by throwing an exception, or by having an additional return parameter, or by providing a separate function to query whether there was an error.

This is precisely the meaning of "signal an error". See above. That is, we cannot (because of diversity in languages and frameworks) say exactly how errors are signaled to users.

Definitely for (1) I don't think there has to be a fallback message result

There can be a fallback message result for syntax and data model errors, but it will not be a very useful message, since the user's intention generally cannot be intuited from a broken message. The fallback string (unless overridden by an implementation-specific fallback, which is permitted by the spec) for syntax and data model errors is what we euphemistically call "the logo", i.e. the string "{�}".

One of the key questions in this balloting is whether we require that implementations provide access to a fallback representation in all cases or whether it is optional.

macchiati commented 1 month ago

Thanks for the background.

On Wed, Jul 17, 2024 at 10:36 AM Addison Phillips @.***> wrote:

@macchiati https://github.com/macchiati

for signaling errors" - If this were "Must provide a mechanism for detecting errors" I would pick a higher number. That is, it could be satisfied by throwing an exception, or by having an additional return parameter, or by providing a separate function to query whether there was an error.

This is precisely the meaning of "signal an error". See above. That is, we cannot (because of diversity in languages and frameworks) say exactly how errors are signaled to users.

Definitely for (1) I don't think there has to be a fallback message result

There can be a fallback message result for syntax and data model errors, but it will not be a very useful message, since the user's intention generally cannot be intuited from a broken message. The fallback string (unless overridden by an implementation-specific fallback, which is permitted by the spec) for syntax and data model errors is what euphemistically call "the logo", i.e. the string "{�}".

One of the key questions in this balloting is whether we require that implementations provide access to a fallback representation in all cases or whether it is optional.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/831#issuecomment-2233844488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMFQXJ7P5QEZUPSEZWDZM2TSDAVCNFSM6AAAAABLBCEKSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTHA2DINBYHA . You are receiving this because you were mentioned.Message ID: @.***>

macchiati commented 1 month ago

Another clarification.

"must signal errors" encompasses all and only those errors listed in the spec, right?

On Wed, Jul 17, 2024 at 10:51 AM Mark Davis Ⓤ @.***> wrote:

Thanks for the background.

On Wed, Jul 17, 2024 at 10:36 AM Addison Phillips < @.***> wrote:

@macchiati https://github.com/macchiati

for signaling errors" - If this were "Must provide a mechanism for detecting errors" I would pick a higher number. That is, it could be satisfied by throwing an exception, or by having an additional return parameter, or by providing a separate function to query whether there was an error.

This is precisely the meaning of "signal an error". See above. That is, we cannot (because of diversity in languages and frameworks) say exactly how errors are signaled to users.

Definitely for (1) I don't think there has to be a fallback message result

There can be a fallback message result for syntax and data model errors, but it will not be a very useful message, since the user's intention generally cannot be intuited from a broken message. The fallback string (unless overridden by an implementation-specific fallback, which is permitted by the spec) for syntax and data model errors is what euphemistically call "the logo", i.e. the string "{�}".

One of the key questions in this balloting is whether we require that implementations provide access to a fallback representation in all cases or whether it is optional.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/831#issuecomment-2233844488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMFQXJ7P5QEZUPSEZWDZM2TSDAVCNFSM6AAAAABLBCEKSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTHA2DINBYHA . You are receiving this because you were mentioned.Message ID: @.***>

macchiati commented 1 month ago

What I really think is that the policy should be:

  1. Must provide a mechanism for detecting any errors specifically listed in the spec.
  2. May also support detecting errors not listed in the spec.
  3. In cases of an error:
    1. Should provide fallback string result, where it can convey useful fallback for the end-user
    2. May provide fallback string result, where it cannot.

Examples: "{�}" conveys no useful information to users "3 cm" conveys useful information to users, even though it should have been expressed as "3 Zentimeter"

sffc commented 1 month ago

I don't understand the difference between "(1) MUST signal errors and MUST provide fallback" and "(2) MUST signal errors or MUST provide fallback".

I think implementations must do both, but not necesarily at the same time. For example, an implementation with two functions formatWithFallback and formatOrThrowException should be conformant, but an implementation with one but not the other perhaps should not be conformant.

I do not think an implementation should be required to have a function that does both at the same time in order to be conformant. For example, a function that throws an exception that has a fallbackMessage field would be doing both at the same time. This is fine, but not required for confornace.

Does that mean I should vote for option 1 or option 2?

eemeli commented 1 month ago

@sffc Option 2 would mean that an implementation would need to provide at least one of the methods formatWithFallback and formatOrThrowException, but would not be required to provide both.

Option 1 allows for a compliant implementation to provide formatWithFallback and formatOrThrowException separately, or as a single method as in the current Intl.MessageFormat JS proposal.

sffc commented 1 month ago

Examples: "{�}" conveys no useful information to users

To jam off this a bit: in ICU4X 2.0 DateTimeFormatter, we have the following error handling:

Pattern: 'It is:' E MMM d y G 'at' h:mm:ssSSS a zzzz

Error Type Output String
(success) It is: Monday November 20 2023 CE at 11:35:03.000 a.m. Greenwich Mean Time
Missing Data It is: mon M11 20 2023 ce at 11:35:03.000 AM +0000
Missing Input Fields It is: {E} {M} {d} {y} {G} at {h}:{m}:{s}{S} {a} {GMT+?}

"Missing Input Fields" attempts to say what type of thing was expected in a particular placeholder position, which is more useful than completely omitting it.

eemeli commented 1 month ago

According to our current formatting spec, the only way to get to "{�}" is if the source message contains a syntax or data model error. In all other cases, we end up with at least some pattern to format. There is a possibility that if, as proposed in #603, we drop the * pattern requirement, which would open up another path to "{�}".

macchiati commented 1 month ago

I think we need to be careful in talking about who it is useful for.

Looking at Shane's,

{E} {M} {d} {y} {G} at {h}:{m}:{s}{S} {a} {GMT+?}

  1. It might be useful for the developer (in debugging). Although for that I would argue that an even more useful message would be something like "Missing $datetime parameter". But for debugging, a good error message is even more valuable, with internal details.
  2. It is not at all useful for the end user; you really wouldn't want that message to show up in production software — or show error messages with internal details.
aphillips commented 1 month ago

@eemeli suggested:

@sffc Option 2 would mean that an implementation would need to provide at least one of the methods formatWithFallback and formatOrThrowException, but would not be required to provide both.

This is correct as far as it goes. It would also be compliant to have a format method that threw for some errors and returned a fallback string for others (this is the example @mihnita gave in the call, the word "or" in the name is perhaps a logical OR). Whether that's a good idea or not is a separate question. It would also be valid with option 2 to do both at the same time:

int result = format(message, argMap, target);
if (result == NO_ERROR) {
   // happy path
   print(target);
} else {
   // target contains the fallback string
}
sffc commented 1 month ago

I think we need to be careful in talking about who it is useful for.

That's a good point that clarifies things. There are cases where an error is more helpful (usually a programmer error), and there might cases where a fallback string is more helpful, but I don't feel super confident in specifically enumerating those caes.

aphillips commented 1 month ago

I think this thread might be focused on the wrong thing.

The question here is really "what does MF2 normatively require for an implementation to call itself 'conformant'?" or maybe "what can we normatively require?"

Our spec carefully enumerates the error conditions and provides tests for them (in a way that implementers can "hook up" to their own implementation, indeed are required to "hook up" more-or-less by hand). But we don't, necessarily, require that you create a specific special error state/value/class for each one. It might be perfectly valid for a Java implementation to just throw RuntimeException with the message "Stuff happened" for every error type and still be conformant with "MUST" for signaling errors. Would that suck? Yes. But that's on the implementer.

Trying to require specific error behavior (including fallbacking) is tricky because we want to allow the signal to be shaped however the implementer feels is most natural for their users, including in environments where MF2 is wrapped by resource or string management APIs and including existing APIs, which are already called by existing code that cannot be changed.

eemeli commented 1 month ago

My take on the overall scope of what we're considering here is that the spec should define what happens when messages are formatted, and that this should include error cases. In fact, this is one of our deliverables:

A specification for resolving messages at runtime, including runtime errors.

If we leave error handling completely out of the spec, then I think we'd need to revisit this deliverable. And I at least would much rather not need to do so.

I'm also glad that we're taking error handling and fallback behaviour seriously, as my experience indicates that message formatting/localization fails in general more often than many other parts of UX code, as it includes additional steps due to the localization of said messages. Failures in message formatting are far more often not discovered until production, as automated tests very rarely test all localizations. Therefore, it's important for the MF2 spec to ensure that users can always get at least some representation of a message via fallbacking, so that a message formatting failure can be considered only a partial failure, rather than a complete failure, of the UI.

Well-defined fallbacking (which we currently have) also ensures that any two MF2 implementations will produce the same output for the same inputs, effectively a requirement for hydration and other techniques allowing a server and client to cooperate in building a UI.

aphillips commented 1 month ago

Well-defined fallbacking (which we currently have) also ensures that any two MF2 implementations will produce the same output for the same inputs, effectively a requirement for hydration and other techniques allowing a server and client to cooperate in building a UI.

This might be true for fallbacking, but it is emphatically not the case for non-erroring messages. Differences in runtime environment, formatting function implementation, and locale data means that the same source message with the same inputs can produce different (but recognizably correct) outputs.

A specification for resolving messages at runtime, including runtime errors.

If we leave error handling completely out of the spec, then I think we'd need to revisit this deliverable. And I at least would much rather not need to do so.

I think you might be reading too much into the deliverable goal? We absolutely do identify the error conditions that arise in the resolution of a message at runtime. A "bad operand" error is a bad operand error. In some cases these are implementation-defined (such as type mismatches), but in most cases they are defined by our spec. So it is fair to say that any two MF2 implementations will produce the same error state. What has been suggested in this discussion over several weeks is that we don't say how that state is communicated.

Revisiting this:

effectively a requirement for hydration and other techniques allowing a server and client to cooperate in building a UI.

I agree that this is somewhat desirable, although, as noted by @sffc and @macchiati and others, the fallback message has limited utility. There's not that much utility variation between these fallbacks in a hydrated message:

You have {$count} attempts remaining on {$date} You have {�} attempts remaining on {�} {�}

The end user is still shaking their head because there is an error preventing usability.

macchiati commented 1 month ago

I think basically, we are identifying a set error conditions (eg syntax), giving them IDs, and saying (with the first 3 cases) that the implementation has to recognize those conditions and be able to communicate them to the caller in some way (exception, different function call, etc).

Now, we are not (and should not) specify the nature of that communication, nor the the format of the error message that results, nor that they have to communicate those precise IDs. That really depends heavily on the capabilities and idioms of the programming language and library.

As to fallbacks message results, I'm rethinking my vote after considering Eemeli's thoughts. It is pretty low effort to return "{�}", so that is not much of an imposition on implementations. It does make it slightly less natural for some environments where the natural idiom would be (in pseudocode) to return null if there is any error:

if (result == null) {
   errorInfo =formatGetError(myMess, parameters);
}
// not that I'm recommending that idiom

But it isn't huge, because it could easily change to

if (result.equals("{�}")) {
   errorInfo =formatGetError(myMess, parameters);
}

So given that, I'm ok with Must(error) and Must(fallback).

stasm commented 1 month ago

According to our current formatting spec, the only way to get to "{�}" is if the source message contains a syntax or data model error. In all other cases, we end up with at least some pattern to format.

Are implementations expected to allow users to format messages that contain syntax or data model errors? Or should there be 2 separate steps in the API: parsing and formatting? In which case syntax and data model errors can be detected early, before the user attempts to format the broken message.

I’d like to challenge the current spec where it reads:

For example, a message with a Syntax Error and no fallback string defined in the formatting context would format to a string as {�}.

If we drop the above and require the API to be two-step, we could then map the two logical alternatives in (2) MUST signal errors -or- MUST provide fallback to these two steps: parsing and formatting.

That said, my preference would be similar to @macchiati’s https://github.com/unicode-org/message-format-wg/issues/831#issuecomment-2233941004:

eemeli commented 1 month ago

Are implementations expected to allow users to format messages that contain syntax or data model errors? Or should there be 2 separate steps in the API: parsing and formatting? In which case syntax and data model errors can be detected early, before the user attempts to format the broken message.

816 is a draft PR based on the consensus we'd reached during the 24 June call adopting what's presented in the ballot as Option (1), and so could be taken as a representation of that option, should we reaffirm here our earlier choice of it.

With that approach, fallback formatting would not be required for messages with syntax or data model errors. The intent is to allow for (but not require) a two-step approach as you describe, so that the earlier parsing step could emit an error, rather than requiring it to produce a formatted result for a broken message.

This would still allow a single-step implementation that always returned a formatted result.

That said, my preference would be similar to @macchiati’s #831 (comment):

  • During parsing, MUST signal errors.
  • During formatting, MUST signal errors and MUST provide fallback.

This would be fulfilled by Option (1). Note that it requires fallback only for "a message that produces a formatting or selection error".

aphillips commented 1 month ago

@stasm

I agree with you, except that there can be static APIs that "do it all in one go". Such an API, if it provided a fallback, would need to use the logo (or some other string). Parsing can be separated from formatting and, indeed, the specification separates these operations. But it doesn't have to to be separate.

Again, I think my concern is that, as an implementer I'm going to make responsible decisions for my users. Elsewhere (in calls and in the various issues linked above) I pushed hard on "MUST signal errors". But I got to thinking: what is our concern in creating this requirement? What specific benefits are we trying to ensure as a standard with "MUST"? We need to clearly define what the bar is for "conformance" and not make it too onerous.

One benefit I see is that, for various conditions specified in our prose, implementations need to be consistent about "being in an error condition"--not succeeding where other implementations were told to fail. So, we can require that, for example, if you pass the operand |horse| to the built-in :number function, you should be in the bad-operand error state--however you define and signal that state. This is always an error. There are some SHOULD or MAY errors in the spec as well.

So, my tendency would be to go to parts of the text that describe error conditions and not say "signal an X error" but rather say "this is an X error and the fallback is Z". In the section on errors we then say "Hey, implementer, signal errors however seems best to you. You are not conformant if parsing/formatting succeeds for any defined error condition. If you implement fallback formatting, you need to emit whatever is defined as the fallback."

Others have said that they want to provide non-erroring formatting functions (to serve up the fallback). but a non-erroring fallback function might not have to be used in an erroring context only. If it's a public API you can just do all your formatting through it, right? If that's true, is that "MUST || MUST" (you have to do one or the other, and MAY do both but never neither)?