Open fantasai opened 2 years ago
Some notes on how to expand this out to accommodate additional space types:
none | [ zero-width | space | ideographic-space | ethiopic-space ]{1,2}#
zero-width
= space to zwspspace
= zwsp to space + ethiopic-space to spaceideographic-space
= zwsp to ideographic-spaceethiopic-space
= space to ethiopic spaceA first level could have only the 1-keyword version.
Florian notes that maybe interpunct belongs in this list also. https://en.wikipedia.org/wiki/Interpunct
I have looked into interpuncts, and changed my mind about them.
Annos undeviginti natus exercitum privato consilio et privata impensa comparavi, per quem rem publicam dominatione factionis oppressam in libertatem vindicavi. Quas ob res senatus decretis honorificis in ordinem suum me adlegit C. Pansa A. Hirtio consulibus, consula rem locum sententiae dicendae simul dans, et imperium mihi dedit.
ANNOS·VNDEVIGINTI·NATVS·EXERCITVM·PRIVATO·CONSILIO·ET·PRIVATA·IMPENSA·COMPARAVI·PER·QVEM·REM·PVBLICAM·DOMINATIONE·FACTIONIS·OPPRESSAM·IN·LIBERTATEM·VINDICAVI· QUAS·OB·RES·SENATVS·DECRETIS·HONORIFICIS·IN·ORDINEM·SVVM·ME·ADLEGIT·C·PANSA·A·HIRTIO·CONSVLIBVS·CONSVLA·REM·LOCVM·SENTENTIAE·DICENDAE·SIMVL·DANS·ET·IMPERIVM·MIHI·DEDIT·This involves transforming: * lone spaces into interpunct+zero-width-space * comma+space into interpunct+zero-width-space * period+space into interpunct+space (or interpunct+zero-width-space, depending on style) * period+NBSP into interpunct (or interpunct+zero-width-space, depending on style) * not shown in this example, but ideally trailing interpuncts at the end of a line should be removed (and possibly `word-break: break-all` should be applied, depending on style). * lower case to upper case (which can be handled with `text-transform`), alongside u to V and j to I (which `text-transform` theoretically could handle, but doesn't) * not shown in this example, but if the text had been written to indicate long vowels, transforming from modern to classical would also involve transforming marcons to apices, except for ī that maps to ꟾ (U+A7FE), so the first two words `Annōs ūndēvīgintī` become `ANNÓS·V́NDÉVꟾGINTꟾ`. That too could fit in `text-transform` in theory. The parts that don't fit in `text-transform` seems beyond the reasonable scope of this property, and the precise rules might even need to be fine tuned for the particular content and styles in question, making it impractical to provide a generic built-in transform. And without doing all of it, you're not switching from one legitimate style to a different legitimate style, and it's unlikely anyone would want it. Interpunct in other languages is typically used for different purposes, so if it cannot be done for Latin, it's not worth doing at all.
TLDR: Transformations from zwsp or space to interpuct, or the other way around, would either be excessively complex, or not practical to use, or both, and even though I was tempted, I think we should not attempt them here.
word-boundary-expansion
is currently about expanding spaces in CJK, but it could be used more generically, e.g. to swap between spaces and Ethiopic spaces in Ethiopic. Can we rename this property to work better for other use cases?Proposed name:
word-boundary-transform
, hooks in nicely withtext-transform
.(I also am not a huge fan of the
word-boundary
part but I don't have a great idea. Maybeword-space-transform
?word-splitter-transform
? Idk.)