missing icu options - Githubissues

ray007 commented 3 years ago

Looking at http://userguide.icu-project.org/formatparse/datetime, there are 3 more timeZone options: O, x and X.

FrankYFTang commented 3 years ago

UTS 35 URL https://unicode.org/reports/tr35/tr35-dates.html#dfst-zone

FrankYFTang commented 3 years ago

x and X are for ISO8601 format- which is not for human to read as human text but for machine readable date format, which covered by the functionality in https://ecma-international.org/ecma-262/#sec-date.prototype.toisostring . Intl.DateTimeFormat is not designed to format date for machine readable format. If these formats are needed, then it should extend the functionality of Date.prototype.toISOString ( ) instead.

FrankYFTang commented 3 years ago

| O | GMT-8 | The short localized GMT format. | | OOOO | GMT-08:00 | The long localized GMT format. |

are reasonable to support.

I think we should remove

| offset | Z | -0800 | | longOffset | ZZZZ | GMT-0800 |

and instead add

sffc commented 3 years ago

CC @justingrant

justingrant commented 3 years ago

Are there other formats from that table that are also not supported because they would be machine-readable? Or would X and x be the only missing ones?

I'm asking because "machine readable" doesn't necessarily mean "not human readable". There are cases like ISO 8601 strings where machine-readable formats are also understandable to humans-- maybe not ideally human-readable but not so unusual in human-visible UI like printed reports. So my recommendation would be to consider adding these overlap cases to the Intl formatting APIs, and only omit formats that are not human-readable (e.g. milliseconds since epoch?), and include formats that are readable by both humans and machines.

Here's two use cases where doing this would help:

when porting code from other libraries or platforms that do support the full range of formats, it'd be nice to not have to convert code to use who different APIs depending on the format string.
a "choose your own date-time format" dropdown list as part of a GUI report designer, where implementation would be easier if there were no special-casing required to emit an ISO string as the format.

FrankYFTang commented 3 years ago

My point to not including those here is not because human cannot read them, but it belong to somewhere ELSE. => Date.prototype.toISOString ( )

ECMA402 is for API to address internationalization need, and what those you mentioned could be a lower level need which belong to ECMA262 in toISOString() . Is there any reason, if these formats are needed, should NOT be part of toISOString() ?

FrankYFTang commented 3 years ago

Mixing formatting API for human readable strings and machine readable strings could be very bad. Here I give you one real example years ago. Many years ago, we have a code to generate X and Y coordinates to output PostScript file to printer, the code are all fine except when it is run on French Locale- the output decimal is using "," instead of "." and PostScript interpreter cannot handle it. It is easy for Human to understand machine readable code, but it is not the case for machine to understand human readable code. Therefore, if the API is designed to output human readable format, but it also become the de factor method to generate machine readable code, then it may cause wanted defect in "some locales but not others". It is better for the code which need to generate machine readable code only use some API which only generate machine readable code, say toISOString()

justingrant commented 3 years ago

In general I agree with @FrankYFTang that it's best not to mix human readable and machine readable APIs. But in this case:

There's overlap. Some formats are both human readable and machine readable.
There's a relevant industry standard (https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table). It'd be unexpected to support the entire list of tokens except for a few.
There are popular APIs (e.g. https://date-fns.org/docs/Unicode-Tokens) that support that standard, which would complicate porting code.

So I'd recommend supporting the full set.

For the specific concern around not using toISOString(), it seems unlikely that most developers would ignore the simplest API for a much more complex format-token-based API that could emit the same output. Developers are usually pretty lazy and usually opt for the easiest option.

FrankYFTang commented 3 years ago

There's overlap. Some formats are both human readable and machine readable.

Which one do you think that is considered as that?

FrankYFTang commented 3 years ago

2. There's a relevant industry standard (https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table). It'd be unexpected to support the entire list of tokens except for a few.

ECMA402 is not stand to support everything defined in UTS35.

justingrant commented 3 years ago

There's overlap. Some formats are both human readable and machine readable.

Which one do you think that is considered as that?

I think X and x are both machine-readable and reasonably human readable. Was that was you were asking?

FrankYFTang commented 3 years ago

3. There are popular APIs (e.g. https://date-fns.org/docs/Unicode-Tokens) that support that standard, which would complicate porting code.

Why is that?

FrankYFTang commented 3 years ago

There's overlap. Some formats are both human readable and machine readable.

Which one do you think that is considered as that?

I think X and x are both machine-readable and reasonably human readable. Was that was you were asking?

In what sense when a human read -08 +0530 Z -0800 -08:00 -0800 -075258 -07:52:58 in the formatted date/time string that human being will see them represent a timezone without reading the ISO8601 spec next to him?

FrankYFTang commented 3 years ago

It does not mean the human being CANNOT understand these are timezone, but no human , without reading a spec would conclude these are representing a timezone, right?

FrankYFTang commented 3 years ago

For the specific concern around not using toISOString(), it seems unlikely that most developers would ignore the simplest API for a much more complex format-token-based API that could emit the same output. Developers are usually pretty lazy and usually opt for the easiest option.

Why do you claim that toISOString() is a "much more complex format-token-based API"?

justingrant commented 3 years ago

For the specific concern around not using toISOString(), it seems unlikely that most developers would ignore the simplest API for a much more complex format-token-based API that could emit the same output. Developers are usually pretty lazy and usually opt for the easiest option.

Why do you claim that toISOString() is a "much more complex format-token-based API"?

I mean the reverse: that toISOString() is so much easier than using a format-token API that I'd be less concerned about developers opting to use the token API to output an ISO string.

justingrant commented 3 years ago

There are popular APIs (e.g. https://date-fns.org/docs/Unicode-Tokens) that support that standard, which would complicate porting code.

Why is that?

If you want to port code from one API to another, one of the biggest challenges is often string-based microformats like this API or regexes which are essentially a mini-language that not all developers may be familiar with. Also, string-based formats can't rely on IDE support (e.g. autocomplete, TS) that typically help developers learn and avoid bugs with new APIs. So IMHO it's generally a good idea to try to use standardized microformats where possible.

Anyway, this is a lot of discussion for a small API option. ;-) Shane asked me for my feedback, and my feedback is that aligning to as many standards-based options as possible would be better. If 402 chooses to go another direction it's not a disaster... it just makes a few things harder.

ray007 commented 3 years ago

It does not mean the human being CANNOT understand these are timezone, but no human , without reading a spec would conclude these are representing a timezone, right?

I would argue that most would at least recognize the variations with ":" in them at the end of the date as timezone.

ray007 commented 3 years ago

For the specific concern around not using toISOString(), it seems unlikely that most developers would ignore the simplest API for a much more complex format-token-based API that could emit the same output. Developers are usually pretty lazy and usually opt for the easiest option.

Why do you claim that toISOString() is a "much more complex format-token-based API"?

I mean the reverse: that toISOString() is so much easier than using a format-token API that I'd be less concerned about developers opting to use the token API to output an ISO string.

And I have long wished for an iso pseudo-locale for DateTimeFormat. But maybe options for toISOString() would be a good idea.

FrankYFTang commented 3 years ago

Think about this, x and X are designed for ISO8601, but there are no way we currently can format a ISO8601 date string by calling Intl.DateTimeFormat, and therefore it make no sense to support them to let them mixed with other styles of date formats. It will only make sense to use x and X in the context of ISO8601 format date.

ray007 commented 3 years ago

Yes, and as I wrote above, I wish there was an iso pseudo-locale for Intl.DateTimeFormat. Because many times I want output similar to the iso string, but

most times with " " instead of "T"
optionally with milliseconds
with or without timezone
as UTC or from current timezone.
sometimes only date or time

justingrant commented 3 years ago

FWIW, all but one of below are currently supported in Temporal.

most times with " " instead of "T"

Not supported today. @ray007, feel free to file an issue in https://github.com/tc39/proposal-temporal/ to request this. Temporal is close to Stage 3 so we'd probably only consider this for Temporal V2, but a timeSeparator option may be worthwhile to consider.

optionally with milliseconds

Temporal.now.instant().toString()
// => "2020-12-11T19:02:46.623256378Z"
Temporal.now.instant().toString({ smallestUnit: 'seconds' })
// => "2020-12-11T18:49:01Z"
Temporal.now.zonedDateTimeISO().toString({ smallestUnit: 'milliseconds' })
// => "2020-12-11T11:03:17.353-08:00[America/Los_Angeles]"

with or without timezone

Temporal.now.zonedDateTimeISO().toString()
// => "2020-12-11T11:00:56.378051215-08:00[America/Los_Angeles]"
Temporal.now.instant().toString({ timeZone: 'America/Los_Angeles' })
// => "2020-12-11T10:50:20.053541055-08:00"
Temporal.now.plainDateTimeISO().toString()
// => "2020-12-11T10:57:31.215904308"

as UTC or from current timezone.

instant = Temporal.now.instant();
instant.toString()
// => "2020-12-11T19:06:07.945397353Z"
instant.toString({ timeZone: 'America/Los_Angeles' })
// => "2020-12-11T11:06:07.945397353-08:00"

sometimes only date or time

Temporal.now.plainDateISO().toString()
// => "2020-12-11"
time = Temporal.now.plainTimeISO()
time.toString()
// => "10:53:53.006790557"
time.toString({ smallestUnit: 'minute', roundingMode: 'ceil' })
// => "10:54"

FrankYFTang commented 3 years ago

Yes, and as I wrote above, I wish there was an iso pseudo-locale for Intl.DateTimeFormat. Because many times I want output similar to the iso string, but

most times with " " instead of "T"

optionally with milliseconds

with or without timezone

as UTC or from current timezone.

sometimes only date or time

pseudo-locale is the wrong approach to go. locale should be designed for human, not machine. What @justingrant listed above by using Temporal with a different set of API which is designed for Machine readable output is the right way to go. I have seen too many mistakes produced by mixing human/machine readable code together in the last 30 years of i18n work. For sure that will lead to very bad bug.

ray007 commented 3 years ago

The iso-format with a " " instead of a "T" looks nice enough, is understood by everybody and sorts correctly on its own. Machine-readable in this case does not mean "not human readable".

sffc commented 3 years ago

I think the committee would be okay adding timeZoneName options that don't explode the data bundle size. For example, adding VVV or VVVV might not cut it. However, x and X probably don't increase data bundle size that much, so it's mostly harmless to support them.

I think x is better than X for ECMA-402. The options could be something like,

timeZoneName: offsetNarrow (-08, +0530)
timeZoneName: offsetShort (-0800)
timeZoneName: offsetShortExtended (-08:00)
timeZoneName: offsetLong (-0800, -075258)
timeZoneName: offsetLongExtended (-08:00, -07:52:58)

FrankYFTang commented 3 years ago

I think the committee would be okay adding timeZoneName options that don't explode the data bundle size. For example, adding VVV or VVVV might not cut it. However, x and X probably don't increase data bundle size that much, so it's mostly harmless to support them.

Increase data size make the current and future JS engine harder to ship with the feature which is for sure the top concern. However, including UNNECESSARY option increase the COST of implementation and serve no body is bad.

I think x is better than X for ECMA-402. The options could be something like,

timeZoneName: offsetNarrow (-08, +0530)

timeZoneName: offsetShort (-0800)

timeZoneName: offsetShortExtended (-08:00)

timeZoneName: offsetLong (-0800, -075258)

timeZoneName: offsetLongExtended (-08:00, -07:52:58)

We need to look at the above from the context that it will be only PART OF a formatted Time string and ask ourself, when would ANYONE need them to be part of a formatted time string which the whole string is NOT ISO8601 (since there are no way we currently format the rest of the part in ISO8601 via this API now).

From my point of view, the 5 you listed above only make sense to support, if this API also format the rest of the date/time string in the ISO8601 format- which this API currently NOT. These 5 are good fit for Date.prototype.toISOString ( ) or the toString API of Temporal, since both of them are dealing with outputting time in ISO8601 format, but not here (since there are no way we can output the REST parts in ISO8601).

Adding these support here will make no one use them- Just imaging what will be the whole formatted string look like, not just itself.

FrankYFTang commented 3 years ago

(new Date).toLocaleTimeString("en", {timeZoneName: "long"})
> "10:08:18 AM Pacific Standard Time"
(new Date).toLocaleTimeString("en", {timeZoneName: "short"})
> "10:08:18 AM PST"
(new Date).toLocaleTimeString("en", {timeZoneName: "offsetNarrow"})
> "10:08:18 AM -08"
(new Date).toLocaleTimeString("en", {timeZoneName: "offsetShort"})
> "10:08:18 AM -0800"
(new Date).toLocaleTimeString("en", {timeZoneName: "offsetShortExtended"})
> "10:08:18 AM -08:00"
(new Date).toLocaleTimeString("en", {timeZoneName: "offsetLong"})
> "10:08:18 AM -0800"
(new Date).toLocaleTimeString("en", {timeZoneName: "offsetLongExtended"})
> "10:08:18 AM -08:00"

Do you really think there are users who will understand what does "10:08:18 AM -08", "10:08:18 AM -0800", or "10:08:18 AM -08:00" mean when they read it from a web page? Please notice there won't be a "GMT" or "UTC" in front of them (because these format demand NOT to have them)

FrankYFTang commented 3 years ago

In the other hand, I think the following are meaningful for most users, comparing to "10:08:18 AM -08", "10:08:18 AM -0800", or "10:08:18 AM -08:00"

(new Date).toLocaleTimeString("en", {timeZoneName: "shortGMT"})
> "10:08:18 AM GMT-8"
(new Date).toLocaleTimeString("en", {timeZoneName: "longGMT"})
> "10:08:18 AM GMT-0800"

Notice the "GMT" part is localized to different string depending on the locale, same as the the rest of the time string.

FrankYFTang commented 3 years ago

@zbraniecki

zbraniecki commented 3 years ago

I agree with @FrankYFTang - for DateTime we should focus on timezone as part of a larger date/time and we should focus on identifying the smallest possible subset of valuable timezones balanced against the payload size / general user value.

In result, I'm okay with including timezone models that don't require data (pure algo), but adding them to localizable date/time outputs makes sense if they are used as localized formats.

From my point of view, the 5 you listed above only make sense to support, if this API also format the rest of the date/time string in the ISO8601 format- which this API currently NOT.

That stands out for me. ISO8601 date and time formatting should be possible but it is not part of Intl and since it is notoriously confused, it is specially important that we don't further mislead people to believe that they can get ISO8601 experience by using Intl API.

FrankYFTang commented 3 years ago

I suggest anyone who like to have the 5 options sffc suggested to write a DIFFERENT proposal to ECMA262 to add them an option for the Date.prototype.toISOString or part of the Temporal toString() function instead. That two functions are the right method to deal with ISO8601.

FrankYFTang commented 3 years ago

Let me make it clear about the fact. Currently, there are no facility in Intl.DateTimeFormat (and from my point of view SHOULD NOT) to format a date into ISO8601 format. Formatting a date into ISO8601 format is not in the scope of ECMA402. It was never supported. I also believe it is better never supported in the scope of ECMA402 (what I mean is if you need such support, it should be part of ECMA262 but not ECMA402)

FrankYFTang commented 3 years ago

Let me make it clear about the fact. Currently, there are no facility in Intl.DateTimeFormat (and from my point of view SHOULD NOT) to format a date into ISO8601 format. Formatting a date into ISO8601 format is not in the scope of ECMA402. It was never supported. I also believe it is better never supported in the scope of ECMA402 (what I mean is if you need such support, it should be part of ECMA262 but not ECMA402)

sffc commented 3 years ago

I'm convinced by https://github.com/tc39/proposal-intl-extend-timezonename/issues/2#issuecomment-747608629. We should not add time zone display options that are not intended for human consumption.

FrankYFTang commented 3 years ago

In 2021-04-08 ECMA402 meeting. We all agree it is the right thing not to include ISO8601 timezone format which to be mixed with locale date formatting since DateTimeFormat does not support ISO8601 style formatting now and should not be ever anyway.

tc39 / proposal-intl-extend-timezonename

missing icu options #2