unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.37k stars 175 forks source link

ICU4X 1.2 release checklist #3297

Closed Manishearth closed 1 year ago

Manishearth commented 1 year ago

Extra pre-checklist checklist

Main checklist

To do in the coming days

Manishearth commented 1 year ago

We should also start drafting a changelog

Edit: I have a draft in https://hackmd.io/@Manishearth/BkIpZnSGh/edit, please help out

Manishearth commented 1 year ago

Changelog is done. @sffc please note that in the experimental section I have marked a bunch of crates as "no other changes", the only changes are edition bumps and clippy fixes. If you want you can publish new versions for them, in which case please note down the version diff since 1.1, otherwise please have it say "No change (still at x.y.z)", the same way we do for the utils.

The idea is that since we only write changelogs for ICU4X releases, we should cover all intermediate versions.

Manishearth commented 1 year ago

Coverage seems fine. icu_properties has low coverage because it's macro-heavy and the tool is very inconsistent about counting that well.

Manishearth commented 1 year ago

note to self: workspaces tool should have --no-git-tag

Manishearth commented 1 year ago

Release is published and tagged. Heading for lunch

sffc commented 1 year ago

For line break, ICU4X 1.2 contains these data entries:

segmenter/lstm/wl_auto@1/my.postcard
segmenter/lstm/wl_auto@1/km.postcard
segmenter/lstm/wl_auto@1/lo.postcard
segmenter/lstm/wl_auto@1/th.postcard
segmenter/grapheme@1/und.postcard
segmenter/line@1/und.postcard

gzipped together, these are 299959 B.

The equivalent in ICU4C 72 should be:

brkitr/burmesedict.dict
brkitr/char.brk
brkitr/khmerdict.dict
brkitr/laodict.dict
brkitr/line.brk
brkitr/line_cj.brk
brkitr/line_loose.brk
brkitr/line_loose_cj.brk
brkitr/line_loose_phrase_cj.brk
brkitr/line_normal.brk
brkitr/line_normal_cj.brk
brkitr/line_normal_phrase_cj.brk
brkitr/line_phrase_cj.brk
brkitr/thaidict.dict

gzipped together, those files are 762631 B.

sffc commented 1 year ago

Comparing for the other three segmenter types:

Grapheme:

ICU4C:
charbrk/char.brk

ICU4X:
charbrk/grapheme@1/und.postcard

Sentence:

ICU4C:
sentbrk/sent.brk

ICU4X:
sentbrk/sentence@1/und.postcard

Word:

ICU4C:
wordbrk/burmesedict.dict
wordbrk/char.brk
wordbrk/laodict.dict
wordbrk/word.brk
wordbrk/thaidict.dict
wordbrk/khmerdict.dict
wordbrk/cjdict.dict

ICU4X:
wordbrk/word@1/und.postcard
wordbrk/w_auto@1/ja.postcard
wordbrk/grapheme@1/und.postcard
wordbrk/wl_auto@1/my.postcard
wordbrk/wl_auto@1/km.postcard
wordbrk/wl_auto@1/lo.postcard
wordbrk/wl_auto@1/th.postcard

All of the above:

ICU4C:
allbrk/sent.brk
allbrk/line_normal_cj.brk
allbrk/burmesedict.dict
allbrk/line_loose_phrase_cj.brk
allbrk/char.brk
allbrk/line_cj.brk
allbrk/line_phrase_cj.brk
allbrk/laodict.dict
allbrk/word.brk
allbrk/thaidict.dict
allbrk/line_loose_cj.brk
allbrk/khmerdict.dict
allbrk/cjdict.dict
allbrk/line.brk
allbrk/line_loose.brk
allbrk/line_normal.brk
allbrk/line_normal_phrase_cj.brk

ICU4X:
allbrk/segmenter/word@1/und.postcard
allbrk/segmenter/sentence@1/und.postcard
allbrk/segmenter/dictionary/w_auto@1
allbrk/segmenter/dictionary/w_auto@1/ja.postcard
allbrk/segmenter/lstm/wl_auto@1/my.postcard
allbrk/segmenter/lstm/wl_auto@1/km.postcard
allbrk/segmenter/lstm/wl_auto@1/lo.postcard
allbrk/segmenter/lstm/wl_auto@1/th.postcard
allbrk/segmenter/grapheme@1/und.postcard
allbrk/segmenter/line@1/und.postcard

Gzip Sizes:

Type ICU4C ICU4X % Reduction
Char 2889 2438 15.6%
Sentence 4352 4295 1.3%
Word 2286466 1852208 19.0%
Line (from above) 762631 299959 60.7%
All of the above 2330198 1862568 20.1%