w3c / eurlreq

European language enablement
7 stars 3 forks source link

German requirements #28

Open Crissov opened 3 years ago

Crissov commented 3 years ago

Random notes on German Gap Analysis

Capitalization

It used to be the norm to expand uppercase umlauts: Ae for Ä, Oe for Ö and Ue for Ü. Most place names still follow this convention, Österreich (Austria) being a notable exception. This is not possible to achieve with text-transform yet.

Elsewhere, uppercase umlaut letters have the diaeresis dots spaced wider (in Ä and Ö) or narrower (in Ü) and lower, such that they fit within the cap height. This local character style has no designated Opentype feature, but is sometimes available as cv## variant or as part of a ss## stylistic set or as a part of locl localized glyphs shapes. A standardized way to access vertically compressed diacritic marks on uppercase letters (and perhaps lowercase letters with ascenders) would be nice and might benefit other languages as well. (Some word processors have an option to discard diacritic marks on uppercased letters in French, for instance.)

Eszett ß is usually uppercased as SS, sometimes as its newish uppercase variant and, at least historically, sometimes as SZ. This latter one is not usually available as an option.

Hyphenation

In compounds, which are more ubiquitous in German than in other writing systems employing the roman script, the preferred hyphenation point is at the semantic and morphological boundary, e.g. Rinder-braten from Rind + Braten. With short derivational affixes, this can be difficult to detect for algorithms and may lead to comical effects, e.g. Ur-insekt (base insect) vs. Urin-sekt (urine champagne). Compounds that are tied together with a hyphen should preferably be broken thereafter, but may be hyphenated elsewhere as well, especially at other semantic boundaries and as far away from the other hyphen as possible.

Morphological breaks are traditionally preferred as hyphenation opportunities over phonological ones, even if both are accepted, e.g. Ma-gnet vs. Mag-net. It’s not possible yet to generally favor one approach over the other.

Linebreaking

In incomplete sentences, as often encountered in headings and bullet points, line breaks are sometimes preferred to occur after punctuation marks like commas and colons, marking logical breaks (but this is often hard to distinguish from commas between enumerated items).

Ligatures

When letter-spacing for emphasis (as is common in blackletter texts), some digraphs should not be broken up, even if they are not forming proper ligatures, e.g. ch, tz, ſt. Since texts and fonts cannot be relied upon to include the necessary markup or substitution features, a high-level control would be helpful.

Blackletter

For stylistic effect, it is sometimes desired to be able to use some blackletter typeface, but no particular one (script code Latf), like Fraktur or Schwabacher. A generic font family name would help.

Cursive

Besides decimal numeric, roman alphabetic, greek alphabetic and roman numerals, some scholarly authors during the 20th century fancied German cursive lowercase letters for list counters. It’s unclear whether U+1D4B6 etc. or U+1D51E etc. would be appropriate. Ready-made Counter Styles does not support either of these yet.

r12a commented 3 years ago

Thanks for this @Crissov. I'll try to make some time to go through the points and add text, where relevant, to the gap-analysis doc.