unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 178 forks source link

Consider adding an AnyCalendarFactory #5739

Open sffc opened 4 weeks ago

sffc commented 4 weeks ago

Currently AnyCalendar::try_new_unstable loads the data every time, making it seem inefficient since calendars are often (but not always) short-lived objects.

Should we consider adding AnyCalendarFactory?

pub struct AnyCalendarFactory {
    payloads: (
        DataPayload<JapaneseErasV1Marker>,
        DataPayload<ChineseCacheV1Marker>,
        // ...
    )
}

impl AnyCalendarFactory {
    pub fn try_new_unstable(...) { ... }

    pub fn calendar_from_kind(&self, kind: AnyCalendarKind) -> AnyCalendar /* or Ref<AnyCalendar> ??? */ { ... }
}

This type could be used internally in the IxdtfParser (#5736).

@Manishearth @robertbastian

Manishearth commented 4 weeks ago

Are they short-lived? The intent of the whole AsCalendar infrastructure was to make it easy to have long lived calendars.

sffc commented 4 weeks ago

There are lots of cases when you might want a short-lived Date<Ref<AnyCalendar>> object. The question is how you get the Ref<AnyCalendar>. Currently you can only get a AnyCalendar from its own constructors, which means you need to get a new instance every time. But IxdtfParser allows returning in arbitrary calendars chosen at runtime.

Actually, perhaps a more efficient approach here would be a type that pairs Date<Iso> with AnyCalendarKind and which implements the IsInCalendar trait. It can also have functions that take a data provider and return Date<AnyCalendar>.

In other words:

pub struct SecretCalendarDate {
    iso_date: Date<Iso>,
    kind: AnyCalendarKind,
}

// With this impl, SecretCalendarDate can be passed directly to DateTimeFormatter
impl IsInCalendar for SecretCalendarDate { ... }

impl SecretCalendarDate {
    pub fn to_any_date_unstable(&self, provider: ...) -> Result<Date<AnyCalendar>, DataError> { ... }
    pub fn to_iso_date(&self) -> Date<Iso> { ... }
}

impl IxdtfParser {
    pub fn parse_date(&self) -> Result<SecretCalendarDate, ParseError> { ... }
}
Manishearth commented 4 weeks ago

To me this seems like more an IxdtfParser issue. I don't think we need a factory type here.

I think it returning (Date<Iso>, AnyCalendarKind) makes sense, and if we want to name that type and give it conversions that would be fine too.

sffc commented 4 weeks ago

It occurs to me that my claim about IsInCalendar doesn't really hold up because the type won't be able to emit the correct YearInfo and stuff without the calendar data. So you would still need to pass it into ConvertAndFormat. However, it would still be nice to be able to enforce that the IXDTF calendar matches the DateTimeFormatter calendar without having to load the AnyCalendar data extra times.

sffc commented 4 weeks ago

So the main functional difference (beyond the type system) of DateTimeFormatter and FixedCalendarDateTimeFormatter is that the former contains a calendar field and the latter does not contain a calendar field. This means that DateTimeFormatter is able to convert dates, but FixedCalendarDTF is not.

Ideally the output of the IXDTF parser can be passed into DateTimeFormatter and succeed if its AnyCalendarKind matches. An external way to do this would be to make ConvertCalendar be fallible:

https://unicode-org.github.io/icu4x/rustdoc/icu/datetime/scaffold/trait.ConvertCalendar.html

It could return Infallible on types where the conversion is infallible. Unfortunately it means that convert_and_format needs to start returning a Result.

We could instead make a new trait TryConvertCalendar and a corresponding format function.

Manishearth commented 4 weeks ago

Not a huge fan of adding another format function, but I get it. I think it's fine to require an explicit conversion

sffc commented 4 weeks ago

It's not about the explicit conversion; it's about the general problem of having data that is tagged with a calendar be able to use a DateTimeFormatter without having to load the calendar object an extra time (since it already lives inside of the DateTimeFormatter) but also still enforce that the calendar in the formatter matches the calendar annotation.

So there are basically 3 situations for formatting with DateTimeFormatter:

  1. Date Calendar not known (Date<Iso>). Convert with DateTimeFormatter's calendar.
  2. Date Calendar known, but data not loaded (this thread). Fail if calendar does not match DateTimeFormatter's calendar, and also perform conversion.
  3. Date Calendar known and data is loaded. Fail if calendar does not match DateTimeFormatter's calendar. No conversion is needed.
robertbastian commented 3 weeks ago

Agree with the approach, but don't like the name "tagged".

Manishearth commented 3 weeks ago

Partial discussion:

sffc commented 3 weeks ago

Conclusions:

// icu_calendar
impl<A: AsCalendar> Date<A> {
    pub fn try_new_from_codes(
        era: Option<Era>,
        year: i32,
        month_code: MonthCode,
        day: u8,
        calendar: A,
    ) -> Result<Self, DateError>

    pub fn try_from_str(
        s: &str,
        calendar: A,
    ) -> Result<Self, ParseError>
}

enum ParseError {
    ....
    MismatchedCalendar(AnyCalendarKind /* or id string */),
}

impl DateTimeFormatter {
    pub fn get_calendar(&self) -> Ref<AnyCalendar>
}

// Call site
let date = Date::try_from_str("...", dtf.get_calendar())?;
let fdt = dtf.strict_format(date)?;

// Possible future
let fdt = dtf.format_ixdtf("...")?;

LGTM: @sffc @Manishearth @robertbastian

Time zone parser:

impl IxdtfParser {
    pub fn try_location_only_iso_from_str(
        &self,
        ixdtf_str: &str,
    ) -> Result<CustomZonedDateTime<Iso, TimeZoneInfo<models::AtTime>>, ParseError> { ... }

    pub fn try_location_only_from_str<A: AsCalendar>(
        &self,
        ixdtf_str: &str,
        calendar: A,
    ) -> Result<CustomZonedDateTime<A, TimeZoneInfo<models::AtTime>>, ParseError> { ... }
}

LGTM: @sffc @robertbastian @Manishearth