unicode-org / conformance

Unicode & CLDR Data Driven Testing
https://unicode-org.github.io/conformance/
Other
5 stars 12 forks source link

Datetime date from CLDR data #329

Closed sven-oly closed 1 week ago

sven-oly commented 1 month ago

Ready for review.

sven-oly commented 1 month ago

I need some help converting these date/time data into UTC "Z" format. Then we can compute the offset seconds too!

Right now, testing this in ICU75 causes all ICU4C, ICU4J and NodeJS tests to fail or have errors.

sven-oly commented 1 week ago

Note that the CLDR expected test results may be incorrect in some cases:

  1. date/time styles full/full have "at" between the date and time output in ICU4C. The expected result isn't correct.

Expected: July 2, 2001, 5:44:15 PM GMT+4:30 Actual: July 2, 2001 at 5:44:15 PM GMT+4:30

{ "dateLength": "long", "calendar": "gregorian", "locale": "en-US", "input": "2001-07-02T17:44:15+04:30[Asia/Tehran]", "expected": "July 2, 2001, 5:44:15 PM GMT+4:30" },

  1. In some cases , there's GMT in the expected March 7, 2024, 11:30:01 AM GMT+3:30 March 7, 2024 at 11:30 AM

    { "dateLength": "long", "calendar": "gregorian", "locale": "en-US", "input": "2024-03-07T11:30:01+03:30[Asia/Tehran]", "expected": "March 7, 2024, 11:30:01 AM GMT+3:30" },

    { "timeLength": "long", "calendar": "gregorian", "locale": "en-US", "input": "2024-03-07T11:30:01+03:30[Asia/Tehran]", "expected": "March 7, 2024, 11:30:01 AM GMT+3:30" },

sven-oly commented 1 week ago

This now uses simple logic to identify the "known issue" of " at" replacing "," in some cases. Similarly, some other substitutions are caught.

Note also that the test data shows an ASCII comma as expected instead of the Arabic comma. These show up as test failures, but maybe it's a bug in the CLDR data.

sven-oly commented 1 week ago

Categorize failing date time tests with 'dateTimeFormatType" == 'standard' to be a "Known Issue".

sven-oly commented 1 week ago

Ready for another look!

sven-oly commented 1 week ago

This classifies most of the date time format problems as "known issue" except for the different beween ASCII comma and arabic comma.