unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 180 forks source link

Prototype an initial data provider for the Ideal Components Bag #1318

Open gregtatum opened 3 years ago

gregtatum commented 3 years ago

This is a subtask of #1317.

Prototype and land an initial design for the provider data. This first pass does not need to be perfect, but will inform what additional information we need from CLDR. This step will probably use some of the skeleton matching and manipulation already present in the current implementation.

gregtatum commented 2 years ago

@sffc where should prototype data provider could go? Should it live in the main data providers or can we do it in experimental?

gregtatum commented 2 years ago

The Provider data format section of the design document shows the general shape of what the data provider information should look like.

There is some design work that needs to be done to decide exactly how the data should look, but I think there is enough to start prototyping an initial solution.

The work here is to build a data provider that can output the desired format. Much of the information can be taken directly from the CLDR availableFormats section. However, for anything that's missing, these patterns will need to be generated using the existing DateTimeFormat components bag machinery.

sffc commented 2 years ago

The lion's share of the code in the datetime crate is built on assumptions of the old data model, and many of the pieces independent of the data model have been moved to the calendar crate. I therefore believe the best place to start working on this is in a new experimental crate. At the end, once things are working, we will reconcile the two.

gregtatum commented 2 years ago

CLDR is the repository of patterns for the date time format for different locales. https://cldr.unicode.org/

ICU4X uses the JSON version of it: https://github.com/unicode-org/cldr-json

When doing DateTimeFormat operations, we use our own types from a provider: The existing DateTimeFormat data providers are here: components/datetime/src/provider

However, we need to transform CLDR data into the format we want. This code lives here: provider/cldr/src/transform/datetime/

The new component should live in experimental/.

gregtatum commented 2 years ago

@ozghimire I tried to provide the context you need to get going. Feel free to ask questions here, or reach out in other channels that work better for you. There is slack and we can schedule a video call if you need.

I would suggest opening a draft PR as soon as you get any code going as I think it will be easier to comment on the direction and approach, even the code is not even compiling.

ozghimire commented 2 years ago

@gregtatum you mentioned slack channel, what's the url to join the working group. Also these documents are great, I am just starting out with the project as well as open source contributions. I would love to connect over a call and get directions before I start working on things.

Update : I have tried joining unicode-org.slack.com but I feel I need a @unicode.org email. I wasn't able to join the group via personal email.

gregtatum commented 2 years ago

I'm looking into it.

ozghimire commented 2 years ago

@gregtatum Ready to start working on this. Would like to get an overall general idea about it.

gregtatum commented 2 years ago

To loop in our conversation outside of GitHub, I suggested finding another small issue to get a bit more practice in the codebase before taking on the larger issue.