w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
304 stars 60 forks source link

U+E007F is reinstated to non-deprecated since Unicode 9.0 #2469

Closed rdeltour closed 1 year ago

rdeltour commented 2 years ago

EPUB OCF says U+E007F is disallowed as one of the two deprecated characters in the Tags and Variation Selectors Supplement.

But E+E007F CANCEL TAG was reinstated as non-deprecated in Unicode 9.0, see the change history for the Unicode Character Database

The stateful tag terminator U+E007F CANCEL TAG, formerly deprecated, was reinstated to non-deprecated, for use in emoji contexts.

See also the up-to-date list of deprecated characters in the latest UCD PropList.txt file (search for "Deprecated").

rdeltour commented 2 years ago

Referencing previous discussion around this in #1885 #1899

mattgarrish commented 2 years ago

Ya, it seems we ended up with E007F deprecated despite wanting to allow emoji sequences...

I wonder if we can remove that bullet to avoid the redundancy of restricting each code point that unicode already deprecates (and the future maintenance it entails). Maybe we can use the file you've referenced @rdeltour to create a new one at the end of the list, like:

Thoughts @iherman @xfq @r12a ?

iherman commented 2 years ago

For someone who has never looked at a Unicode listing closely... @mattgarrish I presume you refer to these lines in the file you referred to:

0149          ; Deprecated # L&       LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
0673          ; Deprecated # Lo       ARABIC LETTER ALEF WITH WAVY HAMZA BELOW
0F77          ; Deprecated # Mn       TIBETAN VOWEL SIGN VOCALIC RR
0F79          ; Deprecated # Mn       TIBETAN VOWEL SIGN VOCALIC LL
17A3..17A4    ; Deprecated # Lo   [2] KHMER INDEPENDENT VOWEL QAQ..KHMER INDEPENDENT VOWEL QAA
206A..206F    ; Deprecated # Cf   [6] INHIBIT SYMMETRIC SWAPPING..NOMINAL DIGIT SHAPES
2329          ; Deprecated # Ps       LEFT-POINTING ANGLE BRACKET
232A          ; Deprecated # Pe       RIGHT-POINTING ANGLE BRACKET
E0001         ; Deprecated # Cf       LANGUAGE TAG

Maybe it is worth making a note on how to read that reference...

mattgarrish commented 2 years ago

I presume you refer to these lines in the file you referred to:

Right, I presume that list is all of them. I checked some of the other files but it appears the deprecated ones have been consolidated there.

It would have been nice if there were an HTML equivalent with a direct link, but searching around I couldn't find one. If there's another reference we could use, though...

The alternative, of course, is we say nothing about deprecated code points and assume that epubcheck should be warning about them, because, well, they're already deprecated by the official standard. That would be even better.

iherman commented 2 years ago

I think that, spec-wise, we should keep this in the spec. It would be strange if epubcheck defined the spec...

mattgarrish commented 2 years ago

Ya, but it's back to that basic question we've bumped into a couple of times now of whether we need to restrict people from using things that are already deprecated by their respective specifications. Epubcheck would only be reporting what unicode defines.

But I'm fine either way.

iherman commented 2 years ago

Ya, but it's back to that basic question we've bumped into a couple of times now of whether we need to restrict people from using things that are already deprecated by their respective specifications. Epubcheck would only be reporting what unicode defines.

But I'm fine either way.

That is also correct...

We could also consider an approach whereby we put, instead of the bullet point in the normative text as above, a note whereby authors should also abide to any restrictions dictated by Unicode (who knows, they may come up, at some point, with a different notion than "deprecated"), put deprecation as an example?

We can also toss a coin. :-)

mattgarrish commented 2 years ago

a note whereby authors should also abide to any restrictions dictated by Unicode

Ya, I like this approach. I'll see what I can come up with.