tc39 / proposal-intl-duration-format

https://tc39.es/proposal-intl-duration-format
MIT License
165 stars 18 forks source link

formatToParts output #55

Closed sffc closed 1 year ago

sffc commented 3 years ago

Intl.DurationFormat is interesting in the sense that it composes multiple types of fields (number fields and list element fields). @FrankYFTang says that the current spec requires outputting only the list fields, but not the number fields. I wondered if we should also output the number fields in some form. Doing so would require more changes to ICU.

Related: https://unicode-org.atlassian.net/browse/ICU-21547

younies commented 3 years ago

could you add an example, please?

FrankYFTang commented 3 years ago

Shane, I have hard time to understand what could be possible put into formatToParts Let's say we have the following

let d = {days: 1, hours:2, minutes:3, seconds: 4}
let df = new Intl.DurationFormat("en")
df.format(d)
// "1 day, 2 hours, 3 minutes, 4 seconds"
df.formatToPart(d)
// output from the current spect text
[ {type: "element", value: "1 day"},
  {type: "literal", value: ", "},
  {type: "element", value: "2 hours"},
  {type: "literal", value: ", "},
  {type: "element", value: "3 minutes"},
  {type: "literal", value: ", "},
  {type: "element", value: "4 seconds"}
]

What do you mean "also output the number fields in some form" in ECMA402 ? (I am not talking about about should we implement it, but how the output could possible be) Could you write down the possible output of your "also output the number fields in some form" in term of js array of object?

sffc commented 3 years ago

Sure.

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " "},
  {type: "unit", value: "day", unit: "day"},
  {type: "literal", value: ", "},
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " "},
  {type: "unit", value: "hours", unit: "hour"},
  {type: "literal", value: ", "},
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " "},
  {type: "unit", value: "minutes", unit: "minute"},
  {type: "literal", value: ", "},
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " "},
  {type: "unit", value: "seconds", unit: "second"},
]
FrankYFTang commented 3 years ago

Looking at what Shane suggest, if we use what the Intl.RelativeTimeFormat currently output, it will be slightly different as

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " "},
  {type: "unit", value: "day"}, // no  unit: "day" here 
  {type: "literal", value: ", "},
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " "},
  {type: "unit", value: "hours"}, // no unit: "hour" here
  {type: "literal", value: ", "},
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " "},
  {type: "unit", value: "minutes"}, //  unit: "minute" here
  {type: "literal", value: ", "},
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " "},
  {type: "unit", value: "seconds"}, // no  unit: "second" here
]

of course, the other option is we can change the Intl.RelativeFormat to output unit attribute also for type:"unit"

The other possibility is

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " ", unit: "day"},            // add unit: "day" here too
  {type: "unit", value: "day", unit: "day"},
  {type: "literal", value: ", "},                               // but not here
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " ", unit: "hour"},            // add unit: "hour" here too
  {type: "unit", value: "hours", unit: "hour"},
  {type: "literal", value: ", "},                                 // but not here
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " ", unit: "minute"},        // add unit: "minute" here too
  {type: "unit", value: "minutes", unit: "minute"},
  {type: "literal", value: ", "},                                  // but not here
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " ", unit: "second"},        // add unit: "second" here too
  {type: "unit", value: "seconds", unit: "second"},
]
FrankYFTang commented 3 years ago

Also... would the unit be in singular form to match other Intl api or plural form to match the value in smallestUnit / largestUnit ?

FrankYFTang commented 3 years ago

so.... should it be unit: "day" or unit: "days"?

ryzokuken commented 3 years ago

@sffc @FrankYFTang both the formats suggested here sound fine to me, I'd be happy changing the spec to either, just let me know if you folks have a preference here.

Also... would the unit be in singular form to match other Intl api or plural form to match the value in smallestUnit / largestUnit ?

Regarding this, @FrankYFTang I would personally prefer the plural form, so "days" and "seconds". Do you have feel strongly otherwise?

FrankYFTang commented 3 years ago

According to https://tc39.es/proposal-temporal/#sec-temporal-tosmallesttemporalunit

  1. Temporal will take both singular or plural form in the option
    1. All other ECMA402 use singular form

What is pros/cons of using plural form? What is pros/cons of using singular form?

FrankYFTang commented 3 years ago

Also see https://github.com/tc39/proposal-temporal/issues/1469 The Temporal spec text is just strange....

FrankYFTang commented 3 years ago

I would personally prefer the plural form, so "days" and "seconds".

Why? what is the reason for you personally prefer plural form?

ryzokuken commented 3 years ago

@FrankYFTang I prefer the plural forms since they map better to the mental model I have for durations. Durations have "x days", "y weeks", "z years" and so on... Date/times, on the other hand, have a specific value (not quantities) "xth day of yth month of zth year".

FrankYFTang commented 3 years ago

I search the net and here is what I found https://www.aqua-calc.com/what-is/time/day "The day is a unit of measurement of time" notice it is NOT "The days is a unit of measurement of time"

https://en.wikipedia.org/wiki/Metric_time "Other units of time: minute, hour, and day, are accepted for use with SI,...." notice it is NOT "Other units of time: minutes, hours, and days, are accepted for use with SI,...."

https://www.math-only-math.com/units-of-time.html "There are different units of time.

Second, minute, hour, day, week, month and year are the units of time."

Notice it is NOT "There are different units of time.

Seconds, minutes, hours, days, weeks, months and years are the units of time."

http://www.exactlywhatistime.com/measurement-of-time/units-of-measurement/ "Other Units Time Units Flowchart illustrating interrelationships among the major units of time More commonly, outside of purely scientific usage, other units are used for longer periods of time. Although technically “non-SI” units, because they do not use the decimal system, these units are officially accepted for use with the International System.

minute (60 seconds) hour (60 minutes, or 3,600 seconds) day (24 hours, or 86,400 seconds) week (7 days, or 604,800 seconds) month (28-31 days, or 2,419,200-2,678.400 seconds) year (about 365.25 days, or about 31,557,600 seconds) " Notice when the units are mentioned, they are in SINGULAR form not in PLURAL form.

https://physics.nist.gov/cuu/Units/outside.html same

so... what make us think the units are "years, months, days" but not "year, month day"?

FrankYFTang commented 3 years ago

Here are some dictionary definition of the English word "unit" https://www.merriam-webster.com/dictionary/unit ": a single quantity regarded as a whole in calculation" ": a single thing, person, or group that is a constituent of a whole" (notice the word unit is referring to "A SINGLE quantity" not "multiple...."

https://dictionary.cambridge.org/us/dictionary/english/unit "a single thing or a separate part of something larger: "a standard measure that is used to express amounts:" (notice it use "A standard measurement" (the word "A" is used here)

https://www.oxfordlearnersdictionaries.com/us/definition/american_english/unit#:~:text=%2F%CB%88yun%C9%99t%2F,all%20living%20organisms%20are%20composed. "measurement unit (of something) a fixed quantity, etc. that is used as a standard measurement a unit of time/length/weight a unit of currency, such as the euro or the dollar"

While multiple amount of the unit is plural- the UNIT itself is singular here,

FrankYFTang commented 3 years ago

I believe in English, we would say, A. "The time unit to express the duration of "4 years" is 'year'."

but NOT B. "The time unit to express the duration of "4 years" is 'years'." C. "The time unit to express the duration of "4 years" are 'years'."

right?

A is proper English and B and C are not proper English, right? Both B and C violate English grammar.

FrankYFTang commented 3 years ago

@jhusain @sffc

sffc commented 3 years ago

Related: https://github.com/tc39/proposal-temporal/issues/1452

I tend to prefer singular forms, simply because any suffixes in English (like "s", "er", "ed", "ing") don't add a whole lot and take up space in a program file. That's why we have DurationFormat instead of DurationFormatter. We're not completely consistent (i.e. Segmenter), but in cases where there is ambiguity, I tend to prefer the shorter form.

However, in Temporal, according to https://github.com/tc39/proposal-temporal/issues/325#issuecomment-691249107, we decided to go with plurals, despite my weakly held objection to that.

ptomato commented 3 years ago

I don't think it's possible to make an English grammar argument for having only singular or plural. English is too flexible :smile: It is correct to say both "the unit of length is the meter" and "Length is expressed in meters". I don't think it's helpful to flood the thread with examples of singular units.

The rationale for having plural field names in Temporal.Duration but not in other Temporal types was exactly what @ryzokuken said. In other words, because "a date in the year 2000" but "a duration of five years". I think it makes sense to be consistent with the field names for Temporal.Duration in this output from formatToParts.

FrankYFTang commented 3 years ago
  1. meter or not have nothing to do with duration so your argument is irrelevant to this discussion.
  2. Intl.NumberFormat in ECMA402 (I am not talking about Intl.DateTimeFormat) is using "day", "hour", "millisecond", "minute", "month", "second", "week", "year" as UNIT, NOT "days", "hours", "milliseconds", "minutes", "months", "seconds", "weeks", "years" https://tc39.es/ecma402/#table-sanctioned-simple-unit-identifiers
  3. We, ECMA402, should oppose to Temporal, a Stage 3 proposal, which disregard what is already agreed by TC39 to accept into ECMA402 Intl.NumberFormat to create inconsistency. Notice I am NOT using the argument of Intl.DateTimeFormat nor Intl.RelativeTimeFormat here- I am using Intl.NumberFormat here. All the argument about the "day" in Intl.DateTimeFormat is pointing a particular day is not related to the issue of the UNIT in Intl.NumberFormat.
  4. If Temporal authors disregard what ECMA402 already accepted into ECMA402 standard, then we, as ECMA402 TG2 should at least keep the unit consistent WITHIN the ECMA402 specification and make the Intl.DurationFormat consistent with Intl.NumberFormat, instead of consistent with Temporal.

Temporal is in Stage 3 and not yet implemented by any VM as my understanding. We should push back to Temporal from ECMA402 to change it.

FrankYFTang commented 3 years ago

However, in Temporal, according to tc39/proposal-temporal#325 (comment), we decided to go with plurals,

tc39/proposal-temporal#325 states the following

Decision 2020-09-11: Property Names All methods in Temporal that accept a TimeLike or a DurationLike property bag must include at least one correctly-spelled property. Empty objects will throw, as will objects that only contain the wrong pluralization and/or mis-spelling. Cases like {years: 1, hour: 1} won't throw because it's obscure and obvious to solve via debugging. Unit names in options values (e.g., smallestUnit) Allow both variants. Use JSDoc @deprecated for the "wrong" unit (the one differing from the type's field name) in TypeScript types. Ideally, we'd have an ESLint rule to catch the "wrong" variant, but that's out of scope.

I saw nowhere in that decision said "decided to go with plurals".

FrankYFTang commented 3 years ago

"Length is expressed in meters"

But the argument in the Duration option is NOT smallestExpressedIn, BUT smallestUnit, right? so "meters" in that sentence is what the length "expressed in", not what the "unit" of that length is.

ryzokuken commented 2 years ago

I think the plural vs singular discussion has been settled in favor of singular forms, right? cc @FrankYFTang @ptomato

ryzokuken commented 2 years ago

Actually, based on the spec, I think everything works as expected except Shane's request of additionally exposing the number formatting. I don't believe it's particularly useful to be so low level and it's not precedented for existing APIs, but I could do it if you feel too strongly. The current result would be something like this according to my reading of the spec.

[
  { type: "element", value: { type: "day", value: "1" } },
  { type: "literal", value: ", " },
  { type: "element", value: { type: "hours", value: "2" } },
  { type: "literal", value: ", " },
  { type: "element", value: { type: "minutes", value: "3" } },
  { type: "literal", value: " and " },
  { type: "element", value: { type: "seconds", value: "4" } },
]

I guess one possible change would be to make type always singular and add another unit field which would be either based on the value?

sffc commented 2 years ago

We never have nesting inside of formatToParts in ECMA-402. https://github.com/tc39/proposal-intl-duration-format/issues/55#issuecomment-1171486907 does not look correct to me.

ryzokuken commented 2 years ago

Apologies for the delay here. I had no idea that nesting is prohibited in this case. I still like the current output and feel that the nested output is JavaScript-y, but would be happy to switch since this is against the existing conventions.

In order to move ahead with this, I'd need your opinion on which result to proceed with. We have three options:

1. Consistency with RelativeTimeFormat

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " "},
  {type: "unit", value: "day"}, // no  unit: "day" here 
  {type: "literal", value: ", "},
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " "},
  {type: "unit", value: "hours"}, // no unit: "hour" here
  {type: "literal", value: ", "},
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " "},
  {type: "unit", value: "minutes"}, //  unit: "minute" here
  {type: "literal", value: ", "},
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " "},
  {type: "unit", value: "seconds"}, // no  unit: "second" here
]

2. More verbose (Shane's suggestion)

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " ", unit: "day"},            // add unit: "day" here too
  {type: "unit", value: "day", unit: "day"},
  {type: "literal", value: ", "},                               // but not here
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " ", unit: "hour"},            // add unit: "hour" here too
  {type: "unit", value: "hours", unit: "hour"},
  {type: "literal", value: ", "},                                 // but not here
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " ", unit: "minute"},        // add unit: "minute" here too
  {type: "unit", value: "minutes", unit: "minute"},
  {type: "literal", value: ", "},                                  // but not here
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " ", unit: "second"},        // add unit: "second" here too
  {type: "unit", value: "seconds", unit: "second"},
]

3. Least verbose (Romulo's suggestion, also used by SerenityOS)

  { type: "element", value: "1 year" },
  { type: "literal", value: ", " },
  { type: "element", value: "2 months" },
  { type: "literal", value: ", " },
  { type: "element", value: "3 weeks" },
  { type: "literal", value: ", " },
  ...

Which one do folks prefer? @sffc @FrankYFTang @romulocintra @IdanHo

romulocintra commented 2 years ago

My preference's 1 if it keeps consistency with the actual output of RelativeTimeFormat - users can reuse their code/understanding of the API for both use cases. But... apparently, we are introducing a new "type" called unit inexistent on RelativeTimeFormat

ryzokuken commented 2 years ago

But... apparently, we are introducing a new "type" called unit inexistent on RelativeTimeFormat

That's understandable since RelativeTimeFormat only supports a single unit, so it's easy to tell which unit is being talked about.

IdanHo commented 2 years ago

The current consensus for SerenityOS is option 1. We believe it strikes the best balance between the over-verbosity of option 2 and the limited information given by option 3.

sffc commented 2 years ago

Option 2 is additive over Option 1.

We should also be clear about the expected behavior when there are multiple parts of the number:

// Current behavior in Chrome
new Intl.RelativeTimeFormat("en").formatToParts(5555.5, "minutes")

(7) [{…}, {…}, {…}, {…}, {…}, {…}, {…}]
0: {type: 'literal', value: 'in '}
1: {type: 'integer', value: '5', unit: 'minute'}
2: {type: 'group', value: ',', unit: 'minute'}
3: {type: 'integer', value: '555', unit: 'minute'}
4: {type: 'decimal', value: '.', unit: 'minute'}
5: {type: 'fraction', value: '5', unit: 'minute'}
6: {type: 'literal', value: ' minutes'}
length: 7

In DurationFormat, if we are introducing a new type: "unit", then I think that is an improvement over RelativeTimeFormat, and we should tag it with the unit.

Option 4, which would be closest to RelativeTimeFormat, would be

[
  {type: "integer", value: "1", unit: "day"},
  {type: "literal", value: " day, "},
  {type: "integer", value: "2", unit: "hour"},
  {type: "literal", value: " hours, "},
  {type: "integer", value: "3", unit: "minute"},
  {type: "literal", value: " minutes, "},
  {type: "integer", value: "4", unit: "second"}
  {type: "literal", value: " seconds"},
]

I'm not advocating for that option, however. I still prefer option 2 since it has the most information. It's easy for developers to ignore information or merge adjacent parts into one, but it's much more difficult for them to add new information.

ryzokuken commented 2 years ago

In the process of writing and fixing #126, I realized that the Option 3 is the only way formatToParts can work while still using ListFormat, since the current design of ListFormat only works with a list of strings. The following are our options:

  1. Stick to option 3: The entire thread talks about potentially better alternatives, but if we go with Option 3, we can continue to use ListFormat this way.
  2. Generalize ListFormat: ListFormat is a simple construction, which could be generalized to work with anything and not just lists of strings.
  3. Not use a locale-sensitive list formatter: We can simplify the list formatting by using an ad-hoc operation, but I'd not recommend this because this would produce worse output in the common case (format).

1 is the easiest option, obviously, with the downside that the formatToParts result isn't as granular as one might want it to be. 2 would take more time and effort, and 3 in my opinion is not a viable route.

What do you folks think?

sffc commented 2 years ago

I had assumed we would make any potential editorial changes necessary in ListFormat to support the desired formatToParts output, which also lays the groundwork for similar things we'll need to do in MessageFormat.

sffc commented 2 years ago

I had assumed we would make any potential editorial changes necessary in ListFormat to support the desired formatToParts output, which also lays the groundwork for similar things we'll need to do in MessageFormat.

We shouldn't flatten the output because it's "easier", especially since the alternative is just a little spec work (no more than an hour or two). We should only consider that option if it has a value proposition.

One potential value of it is that people can externally polyfill DurationFormat on top of ListFormat.

FrankYFTang commented 2 years ago

I think the key issue we need to decide is how would the user truely use the output of formatToParts for this DurationFormat. I feel right now we all are just guessing and that make it hard for us to decide what kind of details is appropriate to expose in the result.

ryzokuken commented 2 years ago

@FrankYFTang I agree with you completely here. While I make progress in this direction, can we figure out a way to find this out? Perhaps via user research?

sffc commented 2 years ago

Perhaps we could identify some web sites that make use of formatToParts in pre-existing formatters, check what they do, and design DurationFormat's formatToParts based on that.

sffc commented 1 year ago

Short TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2022-11-03.md#formattoparts-output

I said: We have a Google use case involving different font sizes. For example, in "1:23.456", the minutes, seconds, and fractional seconds may want to be different font sizes. Similarly, in "1 hr 5 min", the units may have a different font size than the numerals.

FrankYFTang commented 1 year ago

@ben-allen

FrankYFTang commented 1 year ago

sorry I was wrong earlier. I think after DeconstructPattern take the subst as a List, it now work.