patch / cldr-number-pm5

Localized number formatters using the Unicode CLDR
https://metacpan.org/pod/CLDR::Number
Other
8 stars 3 forks source link

test upgrade using preliminary CLDR v28 data #42

Closed patch closed 8 years ago

patch commented 8 years ago

From @JCEmmons:

To: cldr-users Subject: Preliminary JSON available for release 28 From: John Emmons Date: Tue, 1 Sep 2015 00:09:28 -0500

A preliminary version of the JSON for the upcoming CLDR release 28 is now available on github for testing. Please see https://github.com/unicode-cldr/cldr-json for details. Any errors or omissions should be reported via CLDR trac by filing a new ticket at http://unicode.org/cldr/trac/newticket

patch commented 8 years ago

The cldr28 branch has been pushed: https://github.com/patch/cldr-number-pm5/tree/cldr28 https://github.com/patch/cldr-number-pm5/compare/cldr28

Here is the failing test output:

t/00-load.t ............ 1/1 # CLDR::Number v0.12, Moo v2.000002, Perl v5.16.2 (/usr/bin/perl)
t/00-load.t ............ ok
t/currency.t ........... ok
t/format.t ............. ok
t/from-icu4c.t ......... ok
t/from-shutterstock.t .. 1/59
#   Failed test '1000 CHF in en-CH'
#   at t/from-shutterstock.t line 17.
#          got: 'CHF 1.000,00'
#     expected: 'CHF 1,000.00'

#   Failed test '1000 DKK in en-DK'
#   at t/from-shutterstock.t line 17.
#          got: '1.000,00 kr.'
#     expected: 'DKK 1,000.00'

#   Failed test '1000 EUR in de-AT'
#   at t/from-shutterstock.t line 17.
#          got: '€ 1 000,00'
#     expected: '€ 1.000,00'

#   Failed test '1000 EUR in en-AT'
#   at t/from-shutterstock.t line 17.
#          got: '€ 1.000,00'
#     expected: '€1,000.00'

#   Failed test '1000 EUR in en-DE'
#   at t/from-shutterstock.t line 17.
#          got: '1.000,00 €'
#     expected: '€1,000.00'

#   Failed test '1000 EUR in en-NL'
#   at t/from-shutterstock.t line 17.
#          got: '€ 1.000,00'
#     expected: '€1,000.00'

#   Failed test '1000 SEK in en-SE'
#   at t/from-shutterstock.t line 17.
#          got: '1 000,00 kr'
#     expected: 'SEK 1,000.00'

#   Failed test '1000 USD in zh-CN'
#   at t/from-shutterstock.t line 17.
#          got: 'US$1,000.00'
#     expected: 'US$ 1,000.00'
# Looks like you failed 8 tests of 59.
t/from-shutterstock.t .. Dubious, test returned 8 (wstat 2048, 0x800)
Failed 8/59 subtests
t/from-twittercldr.t ... 1/22
#   Failed test 'use the currency symbol for the corresponding currency code'
#   at t/from-twittercldr.t line 44.
#          got: 'THB 12.00'
#     expected: '฿12.00'
# Looks like you failed 1 test of 22.
t/from-twittercldr.t ... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/22 subtests
t/from-uts35.t ......... ok
t/inf-nan.t ............ ok
t/inheritance.t ........ 1/15
#   Failed test 'currency sign inherited from en-001'
#   at t/inheritance.t line 38.
#          got: 'JPY'
#     expected: 'JP¥'

#   Failed test 'locale inheritance'
#   at t/inheritance.t line 41.
# +----+------------+----+-----------------+
# | Elt|Got         | Elt|Expected         |
# +----+------------+----+-----------------+
# |   0|[           |   0|[                |
# *   1|  'ms-SG',  *   1|  'ms-Latn-SG',  *
# |    |            *   2|  'ms-Latn',     *
# |   2|  'ms',     |   3|  'ms',          |
# |   3|  'root'    |   4|  'root'         |
# |   4|]           |   5|]                |
# +----+------------+----+-----------------+
# Looks like you failed 2 tests of 15.
t/inheritance.t ........ Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/15 subtests
t/locales.t ............ ok
t/minmax-digits.t ...... ok
t/numbering-system.t ... ok
t/objects.t ............ ok
t/pattern-coerce.t ..... ok
t/pattern-trigger.t .... ok
t/quoting.t ............ ok
t/rounding.t ........... ok

Test Summary Report
-------------------
t/from-shutterstock.t (Wstat: 2048 Tests: 59 Failed: 8)
  Failed tests:  7, 11, 13, 16, 18, 22, 41, 59
  Non-zero exit status: 8
t/from-twittercldr.t (Wstat: 256 Tests: 22 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t/inheritance.t      (Wstat: 512 Tests: 15 Failed: 2)
  Failed tests:  9-10
  Non-zero exit status: 2
Files=17, Tests=498,  3 wallclock secs ( 0.13 usr  0.05 sys +  2.52 cusr  0.30 csys =  3.00 CPU)
Result: FAIL
JCEmmons commented 8 years ago

No surprises here. Most of these are because of the addition of new English locales to support Europe. For example, we added en_DK in this release, so you would certainly expect to see "1.000,00 kr." for the local currency in that locale, just as you would in Danish.

patch commented 8 years ago

Thanks for the feedback, John!

Other than updating the unit tests, I had to perform one code change to support the v28 data. Although CLDR::Number was handling single quotes for literal sequences in decimal, percent, and currency patterns, it was not for atLeast or range patterns. The new data just introduced single quotes in the range patterns for the es_CO and es_GT locales, initially producing formatted ranges like 'de' 1 'a' 5 and 1 'al' 5 before the fix.

UTS #​35 is not clear about supporting single quotes in Part 3: 2.5 Miscellaneous Patterns and that any of the rules later introduced in Part 3: 3 Number Format Patterns also apply to those patterns.

Note that there are other range and atLeast patterns that do not include quoted words:

locale type pattern
da atLeast {0} eller derover
es atLeast Más de {0}
fa range {0} تا {1}
fi atLeast vähintään {0}
fo atLeast {0} ella meira
fr atLeast au moins {0}
ja atLeast {0} 以上
lv atLeast vismaz {0}
smn atLeast ucemustáá {0}

Here are the new ones in question:

locale type pattern
es_CO range 'de' {0} 'a' {1}
es_GT range {0} 'al' {1}

Even if the quotes are officially supported in range and atLeast, it seems like the official CLDR data should be consistent about their use. My guess is that some other CLDR-based libraries may run into this issue as well. I only noticed it from reviewing a diff of all the data changes.

JCEmmons commented 8 years ago

I would agree that we need to be consistent about this and make sure it is documented accordingly. I would suggest that you file a CLDR ticket at http://unicode.org/cldr/trac/newticket. At a minimum, we should be able to get UTS #35 updated before we publish in a couple of weeks.

patch commented 8 years ago

I filed a CLDR ticket for this issue: http://unicode.org/cldr/trac/ticket/8928

Either way, the cldr28 branch of CLDR::Number now supports those quotes and I'm closing this issue here because I think we're ready for CLDR v28 when it's released. We just need to rerun bin/generate-cldr-data.pl and document any major changes (en_150 inheritance, new numbering systems, new locales, etc.) in the Changes file.

For the record, here are the CLDR JSON files that we currently use for this project: