unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.37k stars 174 forks source link

Consider exact policy around DateTimeError in datetime formatting #4336

Open sffc opened 11 months ago

sffc commented 11 months ago

Currently there are 3 places where errors can be returned in datetime formatting:

  1. Constructor: DateTimeFormatter::try_new
    • Example: locale missing
  2. Format function: DateTimeFormatter::format
    • Example: mismatched calendar
  3. Write function: FormattedDateTime::write_to
    • Example: data for pattern not loaded

Note that case 3 is a core::fmt::Error whereas cases 1 and 2 are DateTimeError. Any errors that occur during write_to are logged with log::warn! and then flattened to a core::fmt::Error.

Currently case 3 does not happen in normal public API usage of the icu::datetime crate. However, it is now possible to reach this case in DateTimePatternInterpolator (#3347).

If we moved this error to phase 2, we may end up slowing down the path for all users since we need to walk the whole pattern in order to figure out if the data is missing.

We should decide on this before shipping DateTimePatternInterpolator.

sffc commented 10 months ago

Want to discuss with:

sffc commented 9 months ago

It should be noted that the ::format function has an error only in DateTimeFormatter, not in TypedDateTimeFormatter, and the only error it returns is a mismatched calendar. So, any additional pre-processing we do would result in changing the signature of the ::format functions on TypedDateTimeFormatter, which we could do in 2.0. However, it might be nice to avoid this.

I'd like to proceed with the following solution:

  1. Change the experimental DateTimeNames::with_pattern to two functions:
    • DateTimeNames::try_with_pattern, which checks that the names are loaded correctly for the pattern
    • DateTimeNames::with_pattern_unchecked, which has the current behavior (the function is unchecked but not unsafe)
    • And keep the existing load_for_pattern and include_for_pattern the way they are.
  2. If a DateTimeError occurs during writing, which shall be possible only if an "unchecked" function was called without its invariants upheld, do the following:
    • debug_assert!(false) with the error
    • Log the error using the internal logging utility (it currently does this)
    • Only then fall back to core::fmt::Error, which only happens without debug assertions, and which is silent only if logging is disabled
  3. Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

OK?

Manishearth commented 9 months ago

sgtm

robertbastian commented 9 months ago

Using core::fmt::Error means code like formatted_date_time.to_string() panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we return Ok and write something like <missing xy-data>.

(edit: this is actually a requirement of the trait)

Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.

sffc commented 9 months ago

Using core::fmt::Error means code like formatted_date_time.to_string() panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we return Ok and write something like <missing xy-data>.

Ok how about writing the string "(icu4x error)"? I don't want to carry around the Display impl for the Error type in release mode, and you get the whole error with debug assertions enabled.

Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.

I mostly agree. Not proposing to change anything right now.

robertbastian commented 9 months ago

We should use a different garbage string for each code location. (icu4x error) is not actionable.

sffc commented 9 months ago

The formatters share the same code path.

Neo: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/raw/neo.rs#L411

Neo Pattern: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/format/neo.rs#L1178C6-L1178C6

Old: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/format/datetime.rs#L78

The best we could do without an enum would be something like (error in icu::datetime::TypedDateTimeFormatter)

robertbastian commented 9 months ago

The GIGO doesn't have to happen at the top level that you linked. write_pattern should be infallible, and then write_field would do garbage out instead of returning errors: https://github.com/unicode-org/icu4x/blob/main/components/datetime/src/format/datetime.rs#L239. This lets us use different messages for different fields.

sffc commented 9 months ago

Hmm, replumbing the formating code is a fairly big change

My main concern with the Display impl is code size. We could avoid this by using a different error struct internally that carries a static str with the additional context

sffc commented 9 months ago

Or are you proposing that we produce outputs such as "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024" (instead of "Saturday, 6 January 2024")

robertbastian commented 9 months ago

Yes, that

zbraniecki commented 9 months ago

IMO "{?}, 6 {?} 2024" is more user friendly than "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024"

robertbastian commented 9 months ago

What are your criteria for "user friendly"? {?} certainly looks nicer, but this is an error, it shouldn't look nice. We also want to communicate which names are not loaded; a bunch of {?} are going to be harder to fix, making it less user friendly.

zbraniecki commented 9 months ago

I think the user persona in your mind is a developer, and in my mind, end user of a product that utilizes ICU4X. Both personas are important and they should receive output adapted to their needs.

but this is an error, it shouldn't look nice.

The developer seeing an erroneous output should be able to act on it, I agree. Your sample serves that well.

The end user of a web browser cannot act on the error, and should see broken output in context of their UI. I believe my sample serves that better. The end user may also get a text error in their debug console, or some telemetry may send error data to developer for debugging, but the UI impact should be low.

robertbastian commented 9 months ago

The use case for this message is DateTimeNames::with_pattern_unchecked. The unchecked part clearly communicates to the developer that they have to be careful with this and verify the output before any user ever sees it. This is why I'm primarily thinking about communicating to the developer instead of an end-user.

sffc commented 9 months ago

I realized that this is basically the same problem space as "GMT+?" (#2245)

Along those lines, we could make up pithy error strings for each field that are both user friendly and searchable. For the month field:

Wdyt?

Manishearth commented 9 months ago

Discussed in meeting to a partial conclusion

Ideas: "(M01) 6, 2024" "{M01} 6, 2024" "[M01] 6, 2024" " 6, 2024" "|M01| 6, 2024" "💥M01💥 6, 2024"

Agreed (amongst @zbraniecki, @robertbastian, @manishearth):

- No scary strings in release/prod mode.
- Fine to have scary string in debug mode
- Shane can pick whatever brackets he wants

@sffc is this okay with you (and if so, please remove the discuss label)

sffc commented 9 months ago

I'd prefer to not include the month number because the error-state code path should be as minimal as possible; formatting "M01" is more complicated than "M" (even if only a little bit more complicated) and I want to start from the position of being as stripped-down as possible.

I am happy with 💥M or 〚M〛 or ⦃M⦄ or «M» or {M}

@zbraniecki How strongly would you like the month number to be rendered?

sffc commented 8 months ago

Discussion with @zbraniecki:

robertbastian commented 8 months ago

For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets

Why not use Writeable for u8?

Also I just realised

  • Fine to have scary string in debug mode

We're debug_assert!ing, so debug mode will panic anyway.

sffc commented 8 months ago

For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets

Why not use Writeable for u8?

  1. This is internal code and we can change it at any time; it does not impact the observable behavior
  2. Fewer branches / code paths; we just change the Option<FixedDecimalFormatter> to FixedDecimalFormatter that defaults to one that performs the desired error-like output
  3. Although small, impl Writeable for u8 has some code size, and it is not an impl that is necessarily going to be present in the binary (https://github.com/unicode-org/icu4x/blob/main/utils/writeable/src/impls.rs)
  4. Why not? Less code, fewer if statements, better maintainability.
robertbastian commented 4 months ago

The remaining work is semver breaking, so moving this to 2.0.

sffc commented 4 weeks ago

Neo uses the new error types. @robertbastian to investigate if this is done.