Open mathiasbynens opened 3 years ago
A bit of progress. Presented https://www.unicode.org/L2/L2022/22160-rgi-emoji-qual.pdf at the UTC. No agreement on adding yet (too late for Unicode 15.0), but will make revised version for next time.
Hi @macchiati
Unicode 16.0 has been released, any news on this proposal?
Markus can give more details, but I think the biggest noticeable change (aside from additions) for implementations will be when ICU releases, with collation.
On Fri, Sep 27, 2024, 01:47 一丝 @.***> wrote:
Hi @macchiati https://github.com/macchiati
Unicode 16 has been released, is there anything new here?
— Reply to this email directly, view it on GitHub https://github.com/mathiasbynens/emoji-test-regex-pattern/issues/7#issuecomment-2378772590, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCGWMEOXEISAGVYGHLZYULRJAVCNFSM6AAAAABO6WEVQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYG43TENJZGA . You are receiving this because you were mentioned.Message ID: @.***>
See also https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html
On Fri, Sep 27, 2024, 09:00 Mark Davis Ⓤ @.***> wrote:
Markus can give more details, but I think the biggest noticeable change (aside from additions) for implementations will be when ICU releases, with collation.
On Fri, Sep 27, 2024, 01:47 一丝 @.***> wrote:
Hi @macchiati https://github.com/macchiati
Unicode 16 has been released, is there anything new here?
— Reply to this email directly, view it on GitHub https://github.com/mathiasbynens/emoji-test-regex-pattern/issues/7#issuecomment-2378772590, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCGWMEOXEISAGVYGHLZYULRJAVCNFSM6AAAAABO6WEVQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYG43TENJZGA . You are receiving this because you were mentioned.Message ID: @.***>
To answer @yisibl’s actual question...
Unicode 16.0 has been released, any news on this proposal?
UTS 51 now defines ED-28. RGI_Emoji_Qualification — the status of emoji sequences This is an enumerated property of strings, defined by the emoji-test.txt file ... ... The property value names and short aliases are:
I haven't thought about this for a while... This would be the first enumerated property of strings in ICU.
Looking at https://www.unicode.org/Public/emoji/16.0/emoji-test.txt, the file actually has four status values, including “component”, which is not listed in the UTS 51 definition.
@macchiati can you elaborate on why UTS51 defines three values but the data file has four? Is “component” intentionally omitted?
I guess I need to start thinking about how I represent this property in ICU. I just created https://unicode-org.atlassian.net/browse/ICU-22931
@mathiasbynens I also guess that you would like me to implement this for one of the 2025 ICU releases...? You might help me justify this for annual planning.
It would be amazing if Unicode would expose all
emoji-test.txt
strings as a property of strings.That, in combination with property-of-strings support in regular expressions, would reduce the need for this repository in the long term in favor of a simple, straight-forward regular expression pattern of the form
/\p{EmojiTest}/v
(property name TBD).It could even be an enumerated property, to provide the full info, e.g.
Values could be
full
,minimal
,unqualified
, orna
.emoji-test.txt
could then be generated from that property.Ref. https://github.com/node-unicode/node-unicode-data/issues/63