Create an Ideal Components Bag / Skeleton for DateTimeFormat

gregtatum commented 3 years ago

This is a meta issue to track implementing the "ideal components bag" as laid out in the DateTimeFormat Deep Dive 2021-10-01 design document. Originally there was some discussion to have this replace the current components bag, but it is to be implemented alongside the existing components bag. A better name can be bikeshed if needed.

The following need to be completed.

[ ] #1318
[ ] #1319
[ ] #1321
[ ] #1325
[ ] #1322
[ ] #1323
[ ] #1324

sffc commented 2 years ago

@gregtatum will provide mentorship.

zbraniecki commented 2 years ago

I'm spreading the word about this issue looking for candidates.

More details:

Description: Currently, DateTimeFormat has two ways to select the right format, both of them are imperfect. We believe we have a balanced novel solution that, once implemented, will become the foundational use of the DateTimeFormat. Scope: We believe that the initial implementation should take one person several (2-3) months to implement. Hopefully in time for ICU4X 1.0. Mentorship: This project is well staffed on the mentorship side with @gregtatum from Mozilla, @sffc from Google and @zbraniecki from Amazon willing to invest time to mentor the engineer who'll pick it. How to start: If you are interested in the project, comment in this issue or join unicode-org.slack.com #icu4x and we'll get you on-ramped.

ozghimire commented 2 years ago

I'm interested to work on this issue.

zbraniecki commented 2 years ago

@gregtatum are you still open to mentor?

pdogr commented 2 years ago

If this issue is still open, I'm definitely interested to work on this.

gregtatum commented 2 years ago

@ozghimire Great! How would you prefer to get started? There is a document linked above outlining the strategy which should discuss how to get things going. I would suggest starting with #1318. I will fill in more details on that issue.

@pdogr I think ozghimire is taking the first step on this to move it forward, and it's hard to parallelize this initial step, but there will probably be work to help out on around the issues. You could take another DateTimeFormat issue to get onboarded. I'm sure there will be opportunities to help in the short term. #1581 would be a good bug to onboard with if you wanted to take it.

randomicon00 commented 2 years ago

Hello @gregtatum, are still looking for contributors?

bdarcus commented 1 year ago

Not sure if you're looking for feedback, but if there's way you could improve the user-friendliness of the config, that would be really helpful. I'm a Rust newbie, and was pretty confused on how to use this feature. So if I do something like this, I get errors about a long list of missing fields:

    let const DAYMONTH = components::Bag {
        year: Numeric,
        month: Short,
    }

Here's the equivalent in JS:

const date = new Date(Date.UTC(2012, 11, 20, 3, 0, 0));

const options1 = {
  month: "long",
  day: "numeric",
};

const options2 = {
  month: "short",
  day: "numeric",
};

const options3 = {
  month: "numeric",
  day: "numeric",
};

console.log("Option 1:")
console.log(date.toLocaleString("de-DE", options1));
console.log(date.toLocaleString("en-US", options1));
console.log(date.toLocaleString("es-PE", options1));
console.log("")
console.log("Option 2:")
console.log(date.toLocaleString("de-DE", options2));
console.log(date.toLocaleString("en-US", options2));
console.log(date.toLocaleString("es-PE", options2));
console.log("")
console.log("Option 3:")
console.log(date.toLocaleString("de-DE", options3));
console.log(date.toLocaleString("en-US", options3));
console.log(date.toLocaleString("es-PE", options3));

... which generates:

> "Option 1:"
> "19. Dezember"
> "December 19"
> "19 de diciembre"
> ""
> "Option 2:"
> "19. Dez."
> "Dec 19"
> "19 dic."
> ""
> "Option 3:"
> "19.12."
> "12/19"
> "19/12"

sffc commented 1 year ago

@bdarcus Until https://github.com/rust-lang/rust/issues/70564 lands, you need to create a mutable components bag and then set your fields on it.

let mut components_bag = components::Bag::default();
components_bag.year = components::Numeric::TwoDigit;
components_bag.month = components::Month::Short;

Manishearth commented 1 year ago

@sffc, @eggrobin, and I discussed @eggrobin's WIP skeleton work at https://github.com/unicode-org/icu4x/compare/main...eggrobin:icu4x:%CF%83%CE%BA%CE%B5%CE%BB%CE%B5%CF%84%CE%AC (https://github.com/unicode-org/icu4x/commit/716672be5bb7b6d3adf136d14fdeaf40c196b982 + previous commits)

The general plan moving forward is that skeleta will be represented using @eggrobin's design which essentially boils down to an enum for (day, time, date, datetime) + a length, plus additional timezone stuff. This makes for 12 possible time components, 9 day components, 8 non-day date components (so 17 total date components), and 12 × 17 combined DateTime components, with three lengths for each.

There is a fallback algorithm that CLDR uses, which is implemented in ICU4X as get_best_available_format_pattern. We move this to datagen and perform the simpler subset of the fallback algorithm that falls back between lengths. In other words, we always generate data for each of the 12/17 skeleta and use the fallback algorithm to find suitable replacement data when not present, but we do not necessarily generate data for each of the lengths.

For the data model, we can store Date/Time/DateTime as separate keys, with the first three having a data model of:

/// For Date/Time only, not datetime
struct PackedSkeletonData<'data> {
   pub indices: ZeroVec<'data, SkeletonDataIndex>, // len = 12 for time, 17 for date
   pub patterns: VarZeroVec<'data, PatternPluralsULE>,
}

// conceptually:
// {
//   has_long: bool,
//   has_medium: bool,
//   has_short: bool,
//   index: u16, // index of first pattern (long if present, else medium, else short)
// }
#[derive(ULE)]
struct SkeletonDataIndex(u16);

struct DateTimeSkeletons<'data> {
   // will typically be small, there are only a couple special cases like E B h m
   map: ZeroMap<'data, Skeleton, PatternPluralsULE>, 
}

For date or time lookup, based on the skeleton we index into the indices array and perform fallback on the available lengths in the metadata. The data is stored contiguously as [long?, medium?, short?] so we can calculate its index by offsetting from the base index, and then fetching.

For datetime lookup, we first index into the DateTimeSkeletons map, and if not present, we then go fetch the individual date and time data and glue them together using the glue from the datetime lengths data.

Manishearth commented 1 year ago

When we fix this we should also fix https://github.com/unicode-org/icu4x/issues/3762

sffc commented 1 year ago

@sffc - How can we change the encoding to flatten PatternPlurals into this index lookup?
@zbraniecki
```
[12][K][V][K2][V2]
```
K - Plural/Declension/Etc
V - 0, 1, 2, 3 - Plural Form
@sffc - Right now we have 16 bits, of which 3 are for length. What if we used an addition 5 to encode the plural variants.

Key:
[all have full]
- has_long
- has_medium
- has_short
[all have other]
- has_zero
- has_one
- has_two
- has_few
- has_many

Or another model:

[all have full]
- has_long
- has_medium
- has_short
- has_six_plurals

Or make a model that stores different sets of plurals in only 2 bits.

@sffc and @Manishearth to work on this after finishing neo symbols.

sffc commented 8 months ago

https://github.com/unicode-org/icu4x/issues/1317#issuecomment-1623963015 is a design for how to store skeletons in the data file, but it doesn't directly address the question of knowing ahead of time which names and name lengths to include.

With semantic skeleta, are any of the following invariants true (across all locales and calendars)?

If the skeleton does not have Weekday, then the pattern does not have Weekday.
If the skeleton has a short Weekday, then the pattern has a short Weekday.
If the skeleton has a long Weekday, then the pattern has a long Weekday.
If the skeleton does not have Month, then the pattern does not have Month.
If the skeleton has a numeric Month, then the pattern has a numeric Month. (trick question! I know this one happens to be false in the Hebrew calendar)
If the skeleton has a short spellout Month, then the pattern has a short spellout Month
If the skeleton has a long spellout Month, then the patterh has a long spellout Month

And similar for Era, Day Period, and Time Zone.

Depending on which of these invariants work out, we should be able to have static analysis of a skeleton to produce an auto-sliced data bundle.

sffc commented 8 months ago

My understanding from @eggrobin on the above questions is:

Including Day-of-Month could imply including Weekday, because they are both ways of representing specific dates
Including Month does not imply including Weekday, because a weekday represents a specific date, not a month
Can't make any guarantees about the width of the fields

sffc commented 4 months ago

Notes from this topic in the ICU4X-TC meeting on 2024-07-11:

https://docs.google.com/presentation/d/1qXxBv4DVnqfBSpGt9ikVQLk9M0LX65O9lWvDo0pH9SU/edit#slide=id.p

@zbraniecki - Is this a way to get weekday display names?
@sffc - LDML defines LLLL, and NeoFormatter will get that. It also defines standalone weekday (maybe c?).
@zbraniecki - User story: I want to collect names of week days of all 7 days in gregorian calendar. How do I do this? In DateTimeFormatter I need to find a date that is Monday and then +1 day and query, right?
@mihnita - There are many sigils for Timezone in CLDR (7 different ones?). How do you cover them?
@sffc - Time zones are a can of works we can discuss on a different day. I have a solution, I think.
@macchiati - It's valid to request a specifically numeric date
@sffc - That's what Short says: A short date; typically numeric, as in “1/1/2000”
@macchiati - That seems reasonable if we strengthen the docs a bit
@robert - what can go wrong in the fallible formatter?
@sffc - Not sure, it may fail
@robert - we should know what may fail. And we may want to add writeable infallible formatter that does debug assertions
@sffc - I'm comfortable with that.
@zbraniecki - You could generate the markers out of CLDR at build time if you wanted, right?
@sffc - Yes
@zbraniecki - What performance metrics have you been optimizing for?
@sffc - Memory, I looked at performance and it has been steady, but my focus is on memory
@sffc - For a single skeleta I got the memory from 6KB stack size to 560 bytes stack size, 90% win
@zbraniecki - How does it compare to ICU4C?
@sffc - Even the old ICU4X 1.0 beats ICU4C hands down. This is just another 10x improvement.
@sffc - Do we have consensus that this is the direction we want to go with ICU4X 2.0 datetime formatting?
@zbraniecki - I'm okay with it even given the limitations; we should document them.
@mihnita - Some of these decisions are UX decisions.
@younies - I like the design, but if we extend it for currency and units... it's not clear how it extends. I like how it would work for a subset of units. Maybe for common things like duration.
@sffc - To address @mihnita's concern, we have a path to add new entries to the enum, if we feel they are legitimate use cases. Custom patterns is an escape hatch until CLDR approves new semantic skeleta.
@younies - When we are shipping a customized data for the user, is that a security issue?
@sffc - Data is keyed on CLDR version and code, not on user.
@younies - Sounds good. I approve this.
@zbraniecki - I am also comfortable with this. Thank you for your work on it. I'm quite happy with the memory savings.
@mihnita - I agree that classical skeleta let you do a lot of bad things. I'm just not convinced semantic skeleta are sufficiently general. And custom pattern is bad i18n.
@sffc - Do we have agreement on this: "ICU4X implements semantic skeleta in Rust as presented today, pending CLDR approval of the semantic skeleta proposal."
@mihnita - The idea of having predefined skeleta seems good. It seems better if, in addition, ICU4X allowed developers an escape hatch for classical skeleta.
@sffc - To start, I would rather push people strongly to semantic skeleta, so that we can hear clearly where they don't fit their use case. So I would like to remove classical skeleta from the API in 2.0. We can re-evaluate when we have clear user needs.
@mihnita - How do you think this integrates with MessageFormat 2.0? It only needs date/time full/long/medium/short.
@sffc - Those are supported in my ICU4X proposal.

Statement seeking consensus:

The ICU4X-TC approves the overall design of the ICU4X Rust code implementing semantic skeleta.
If CLDR-TC approves semantic skeleta for LDML 46, the ICU4X-TC approves replacing the existing date time formatting classes with the new semantic skeleta formatting classes in the ICU4X 2.0 release.

sffc commented 3 months ago

Working on the combined date/time pattern overrides.

All classical skeleta across all locales and calendars in CLDR (as a comma-separated list):

Bh,Bhm,Bhms,d,E,EBhm,EBhms,Ed,Ehm,Ehm-alt-ascii,EHm,Ehms,Ehms-alt-ascii,EHms,Gy,GyMd,GyMMM,GyMMMd,GyMMMEd,h,h-alt-ascii,H,hm,hm-alt-ascii,Hm,hms,hms-alt-ascii,Hms,hmsv,hmsv-alt-ascii,Hmsv,hmv,hmv-alt-ascii,Hmv,M,Md,MEd,MMM,MMMd,MMMEd,MMMMd,MMMMW-count-one,MMMMW-count-other,ms,y,yM,yMd,yMEd,yMMM,yMMMd,yMMMEd,yMMMM,yQQQ,yQQQQ,yw-count-one,yw-count-other,MMMMEd,MMdd,MMMMW-count-zero,MMMMW-count-two,MMMMW-count-few,MMMMW-count-many,yMM,yw-count-zero,yw-count-two,yw-count-few,yw-count-many,GyMMMM,GyMMMMd,GyMMMMEd,MMMM,MMMMdd,yMMMMd,yMMMMEd,MMd,hmsvvvv,Hmsvvvv,hmvvvv,Hmvvvv,yQ,yMMdd,GyMMMEEEEd,MMMEEEEd,MMMMEEEEd,yMMMEEEEd,yMMMMEEEEd,Md-alt-variant,MEd-alt-variant,MMdd-alt-variant,yMd-alt-variant,yMEd-alt-variant,MMMdd,mmss,HHmmZ,yMMMMccccd,EEEEd,MEEEEd,yMEEEEd,HHmmss,Mdd,Hmm,HHmm,GyM,yyyy,yyyyM,yyyyMd,yyyyMEd,yyyyMMM,yyyyMMMd,yyyyMMMEd,yyyyMMMM,yyyyQQQ,yyyyQQQQ,yyyyMMMMd,yyyyMMMMEd,yyyyMM,yyyyMMMEEEEd,yyyyMMMMEEEEd,yyyyMd-alt-variant,yyyyMEd-alt-variant,yyyyMMMMccccd,yyyyMMdd,yyyyMEEEEd,GyMEEEEd,UM,UMd,UMMM,UMMMd,UMd-alt-variant,HmZ

Of those, the ones that span date/time/zone are:

EBhm EBhms Ehm EHm Ehms EHms hmsv Hmsv hmv Hmv hmsvvvv Hmsvvvv hmvvvv Hmvvvv HHmmZ HmZ

In order to keep a fixed set of auxiliary keys known at compile time, I will proceed to hard-code and generate those sets. I will store them only when they differ from the glue-based pattern. I will also open an issue to add a test that will inform us if any more such skeleta get added in the future.

sffc commented 2 months ago

I'm going to delete the benches for the old code soon, but here is a snapshot of how they currently compare, running on the same data but with different APIs:

datetime/zoned_datetime_overview
                        time:   [32.527 µs 32.563 µs 32.616 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

datetime/neoneo/datetime_zoned_datetime_overview
                        time:   [32.503 µs 32.565 µs 32.635 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

datetime/datetime_components
                        time:   [365.04 µs 365.44 µs 365.89 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

datetime/neoneo/datetime_components
                        time:   [322.48 µs 323.54 µs 325.27 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

datetime/datetime_lengths
                        time:   [36.500 µs 36.572 µs 36.669 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

datetime/neoneo/datetime_lengths
                        time:   [38.260 µs 38.286 µs 38.312 µs]

In words, peformance on lengths and lengths_with_zones is basically identical, and performance on components is improved from 365 to 323 (a 12% improvement).

The primary metrics that my work has been focused on are memory use and binary/data size, so it's nice to see that we could achieve those while also getting a small improvement on benchmark performance as well.

sffc commented 2 months ago

Some quick binary size analysis after migrating tui to neo datetime:

Compilation: cargo +nightly-2024-07-23 build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --profile=release-opt-size --examples --workspace --features serde --exclude icu_provider_export --exclude icu_capi

Binary size breakdown of tui (including data):

	1.x DateTime	Neo DateTime
Total Binary Size	4,661,520	1,598,032
Code Size	104,401	92,683
Data Size (.rodata)	2,933,616	1,199,840
Data Size (.data.rel.ro)	1,030,024	146,344

How I calculated this: I opened both binaries in IDA Pro and measured the length of the segments.

What is .rodata vs .data.rel.ro: I gather that .rodata are the pure read-only binary data such as static strings (and I see some human-readable strings in that section), whereas .data.rel.ro are the stack frames of static variables, i.e. the baked data structs.

Why the numbers don't add up: the ELF contains some sections which are neither pure code nor pure data. The biggest section not included in the numbers above is used for Relocation.

The reduced data is because previously the binary was including data it didn't need for formatting, such as weekday names and certain types of time zone names, and because the new pattern layout is more efficient. I'm also pleased to see the reduced code size, which is probably due to less branching overall.

sffc commented 2 months ago

Discussion with @robertbastian (mostly obsolete given #5538):

@sffc I posted in Slack my plans here. I was planning to finish the Rust API migration (which includes year marker merging, renaming, and changing the signature of the NeoOptions to what we agreed), and then migrate FFI, and then finally delete the old code.
@robertbastian I think we should prioirtize deleting the old code. When I was working on time zones, it wasn't clear to me what was new and what was old. It's a lot of maintenance cost.
@sffc I'm okay with a short-term solution where I delete the old FFI, comment-out the FFI tests, and then I can delete the old Rust API. Then when I add the new FFI, I add back the tests.
@robertbastian Maybe make the FFI methods panic instead of deleting, so that we keep the API shape? And then you can still compile the C/C++ tests; you just don't run them.

Suggestion for 2.0 FFI:

enum DateFieldSets { }
enum CalendarPeriodFieldSets { }
enum TimeFieldSets { }
enum ZoneFieldSets { }

struct CompositeFieldSet {
    date: Option<DateFieldSets>,
    // ...
}

struct DateTimeOptionsV1 {
    length: Option<Length>,
    year_style: Option<YearStyle>,
    fractional_second_digits: Option<FractionalSecondDigits>,
    // ...
}

// ALL NEW!
opaque DateTimeFormatter {
    pub fn create(&Locale, CompositeFieldSet, DateTimeOptionsV1) -> Result<>
}

// ALL NEW!
opaque GregorianDateTimeFormatter {
    pub fn create(&Locale, CompositeFieldSet, DateTimeOptionsV1) -> Result<>
}

opaque DateFormatter { ... } // icu::datetime::Formatter<YMD>

opaque GregorianDateFormatter { ... } // icu::datetime::Formatter<YMD>

opaque TimeFormatter { ... } // icu::datetime::Formatter<TimeComponents>

Types removed:

ZonedDateTimeFormatter
GregorianZonedDateTimeFormatter

Note: I want to add more field-set-specific types, but @Manishearth suggests doing this after we have better namespacing post-2.0, and I agree.

sffc commented 1 month ago

How should I organize the code in the repo? I know this is banal, but I need to do something. What we currently have is a bit of a mess, and we need to move things anyway as part of the big rename, so I may as well move them into good places.

Currently, inside components/datetime/src:

Path	Description	Visibility
fields/*.rs	Field and its related types	Public
format/datetime.rs	Core formatting logic	Private
format/neo.rs	DateTimeNames, DateTimePatternFormatter	Private with public re-exported types
options/*.rs	1.x formatter options	Datagen-only except for the HourCycle type, which we could replace with the one from icu_preferences
pattern/*.rs	Internal pattern types and logic	doc(hidden)
provider/calendar/*.rs	1.x data structs	Mix of public and datagen
provider/neo/*.rs	2.x data structs	Public
provider/packed_pattern.rs	Packed pattern data struct, recently landed	Public
provider/time_zones.rs	Time zone data structs	Public
raw/neo.rs	DateTimeZonePatternSelectionData and its related types	Private
skeleton/*.rs	Classical skeleton code	Datagen
calendar.rs	CldrCalendar and related traits and trait impls	Private with public re-exported types
error.rs	MismatchedCalendarError	Public
external_loaders.rs	Fixed decimal formatter and calendar data loader helpers	Private
helpers.rs	size_test macro	Private
input.rs	ExtractedInput	Private
lib.rs	Nothing defined, only re-exports	Public
neo_marker.rs	Declarations and definitions of 2.x traits; field set markers	Public
neo_pattern.rs	DateTimePattern	Public
neo_serde.rs	Serde impls for 2.x things	Private
neo_skeleton.rs	Enums and structs for semantic skeleta	Public
neo.rs	Main 2.x formatter types	Public
time_zone.rs	Time zone formatting	Private
tz_registry.rs	Time zone format registry: mapping between semantic time zones, resolved time zones, and field-based time zones	Private

All re-exports are from the root unless otherwise specified.

What I think I want to move:

Destination File Name	Stuff to move inside	Visibility
names.rs	DateTimeNames	Private with public re-exported types
dt_pattern.rs	DateTimePattern, DateTimePatternFormatter	Private with public re-exported types
raw.rs	What is currently raw/neo.rs	Private
error.rs	All error enums throughout the crate	Public/private as needed
fieldset.rs	Field set markers	Public
scaffolding/*.rs	2.x formatting traits. Also move CldrCalendar and friends in here	Public, but nothing in this module should show up in normal usage of the API
skeleton_impl/*.rs	Rename of skeleton/*.rs	Datagen
skeleton.rs	Enums and structs for semantic skeleta	Public
formatter.rs	Main 2.x formatter types	Private with public re-exported types

Note: I want everything to have exactly 1 place where it is exported.

Thoughts/approval? @Manishearth @robertbastian

Manishearth commented 1 month ago

This seems fine! I think we should be organized but it's flexible and we don't have to get it perfect right now. Something vaguely sensible is enough for me!

sffc commented 1 month ago

Type naming discussion:

Points brought up:

Formatter feels too generic and we have man yformatters
FieldSetFormatter and SkeletonFormatter are inside basebally
DateTimeFormatter is fine as long as we never add a set of aliases like DateTimeFormatter, TimeFormatter, DateFormatter, etc for runtime fieldsets

Discussion:

@sffc Runtime fieldsets are a power user API anyway. Most people should be using compile-time fieldsets like YearMonthFormatter. I'm okay committing to no convenient aliases for the runtime ones.
@manishearth and @robertbastian agreed

Conclusion:

DateTimeFormatter and FixedCalendarDateTimeFormatter, with an optional TBD after 2.0 GregorianDateTimeFormatter alias
Have aliases for YearMonthDayFormatter etc, potentially post 2.0, names need bikeshed.

Agreed: @sffc @manishearth @robertbastian

sffc commented 3 weeks ago

Review with Zibi:

@zbraniecki: Make the field set fields private? I think we should end up with a macro like:

fieldset!([year, month, day])::medium() => YMD::medium()

DateTimePattern verbiage:

Original: Most clients should use DateTimeFormatter instead of directly formatting with patterns.

[DateTimePattern] forgoes most internationalization functionality of the datetime crate. It assumes that the pattern is already localized for the customer's locale. Most clients should use [DateTimeFormatter] instead of directly formatting with patterns.

Type exports:

icu::datetime::pattern::DateTimeNames
icu::datetime::pattern::DateTimePattern
icu::datetime::pattern::DateTimePatternFormatter

On the filesystem:

names.rs?
pattern/mod.rs?
Maybe icu::datetime::private::pattern::Pattern?
Maybe icu::datetime::private::pattern::ReferencePattern?

Errors:

@zbraniecki: Slight preference for exporting errors adjacent to the types they are used in

Manishearth commented 3 weeks ago

fieldset!([year, month, day])::medium() => YMD::medium()

Let's not use macros in type position, they don't work in every possible type position and this gets annoying quickly. I think it's fine to provide such a macro but having a fallback is good.

sffc commented 3 weeks ago

Notes from brief discussion with @Manishearth:

The user-facing field set related types can be put into 4 buckets

Compile-time field sets such as struct YMD { options }
Runtime field sets such as enum DateFieldSet { YMD, ... }
Runtime skeletons such as struct DateSkeleton { field_set: DateFieldSet, options }
The options themselves, such as Alignment or FractionalSecondDigits

Where should these all go?

My original idea (Option 1):

icu::datetime::fieldset::YMD
icu::datetime::skeleton::DateFieldSet
icu::datetime::skeleton::DateSkeleton
icu::datetime::skeleton::Alignment

One that @Manishearth suggested (Option 2):

icu::datetime::fieldset::YMD
icu::datetime::fieldset::runtime::DateFieldSet
icu::datetime::skeleton::DateSkeleton
icu::datetime::options::Alignment

Here's another one that might be good (Option 3):

icu::datetime::fieldset::YMD
icu::datetime::fieldset::DateFieldSet
- NOTE: The runtime fieldsets are enums, so they will show up in a separate section of the docs page than the compile time fieldsets, which are structs. https://unicode-org.github.io/icu4x/rustdoc/icu_datetime/fieldset/index.html
icu::datetime::options::DateFieldSetWithOptions
icu::datetime::options::Alignment

I'm not sure about options::DateFieldSetWithOptions. I could still put it at skeleton::DateSkeleton. But skeleton sounds more important, but in the ICU4X world, it's this thing that most people shouldn't generally always be using.

It also occurs to me that skeleton and scaffold are kind-of similar words, but they mean different things.

Manishearth commented 3 weeks ago

To add some points, personally I think the ideal situation is that the fieldsets, runtime fieldsets, skeleta are all in their own modules, containing nothing but those types, combiner types (Combo), and potentially other modules. Exactly how that is achieved can be done in multiple ways, with fieldset and fieldset::runtime or fieldset and fieldset_runtime, and with skeletons being a submodule of fieldsets or options or something. No strong opinion there.

My vision is that each of these (except for options) can have strong documentation about the usage of these things that the rest of the crate can link to.

Manishearth commented 3 weeks ago

@Manishearth I think the design is clean. We basically have compile time fieldsets, runtime fieldsets, and skeletons that combine fieldsets and options.
@hsivonen I don't like relying on struct/enum to distinguish types. It feels too much like insider information such that most people will find it difficult to know and understand.
@sffc I don't like having types that users need to use being more than one module away from the root. Why don't these fieldset types live in the same module? They both represent field sets.
@Manishearth In general I think the UX of opening a module with multiple "types of things" is confusing: you open a module with five of one thing and five of another, it's unclear which things you need to look at to fully understand the module. It's fine if a module has five types called FooFieldset, it's clear that "I just should look at TimeFieldset and once I understand that then I will understand that DateFieldset". However if a module has DateFieldset, TimeFieldset, ..., and DateSkeleton, TimeSkeleton, .... then it becomes unclear how to slice things: should you look at one of the Date* types and then one of the Time* types (and so on) or should you look at one of the *Skeleton types and one of the *Fieldset types.
@robertbastian You can put them in the same module if they were called RuntimeDateFieldset, etc.
@sffc I see two logical ways to organize: keep field sets together, or keep compiletime/runtime together. The neo_skeleton module should come down to 2 types: skeleton structs, and the enums to represent the fieldsets.
@sffc The Fieldset enums can be seen as being an implementation detail of the skeleton structs.
@Manishearth We can reexport the symbols from the crate.
@sffc I don't like reexporting. It creates multiple ways of doing the same thing, and it's not necessary in our case. It also requires suppression workarounds in the FFI code.

@sffc's current thinking:

datetime::fieldset::YMD
datetime::fieldset_dynamic::DateFieldSet
datetime::fieldset_dynamic::DateSkeleton
datetime::options::Alignment

@Manishearth But then we have two dimensions in the same module:

(Date, Time, DateTime, ZonedDateTime)(Skeleton, Fieldset)

@sffc We could reduce it to one type, like this:

// mod fieldset_dynamic
enum DateFieldSet {
    YMD(fieldset::YMD),
    MD(fieldset::MD),
}

@Manishearth But currently it's nice to write:

let skeleton = NeoDateSkeleton {YearMonthDay, YearStyle::whatever};

and this gets a bit more complicated on construction? depends on what the user patterns would be like.

@sffc in many cases you'll end up ... currently datetime fixture code needs this, which parses out each individual thing
@sffc ... why don't we consider adding a Builder type
@manishearth Let's file an issue and perhaps link to it in the docs
@sffc I'm not happy with the deeply nested modules, but I don't have an alternative

Proposal:

datetime::fieldset::YMD
datetime::fieldset::dynamic::DateFieldSet
- enum containing YMD(fieldset::YMD), etc
datetime::options::Alignment
Future: datetime::fieldset::dynamic::builder::DateFieldSetBuilder

LGTM: @sffc @Manishearth

sffc commented 2 weeks ago

I made a proposal to switch around how time fields are handled:

https://docs.google.com/document/d/1SkxoitlCFiQ_KGW3dmRk7lbumGd_N1rfYJLMlkiXvh4/edit?tab=t.0

I started implementing this in ICU4X. My idea:

Change the API to reflect the new enum
Change the time data payloads to contain 3 patterns, variants of Hour, Hour+Minute, and Hour+Minute+Second using the same mechanism that we have for distinguishing the three year styles
Reduce down to 3 time data payloads: default hour cycle, h12, and h24
Reduce down to a single time field, but otherwise keep the current traits working as they are

Then, I will update #5761 to switch around the dynamic field sets as discussed previously.

One small caveat: I realized that overlap patterns already use the variants for year style. However, none of our overlap patterns currently contain the year field, so I'm currently adding a debug assertion and changing the variants there to be for time precision instead.

Does this sound okay @Manishearth?

Manishearth commented 2 weeks ago

:+1:

unicode-org / icu4x

Create an Ideal Components Bag / Skeleton for DateTimeFormat #1317