Feasibility of new combining characters?

stevengj commented 8 years ago

The most general form of this proposal, suggested by @StefanKarpinski, is to encode new combining characters: mathematical subscript and mathematical superscript, which indicate that the previous glyph is to be rendered as a subscript or superscript, respectively.

How difficult would this be to implement? Can it be done with font changes alone? Or does it require changes to text-rendering engines? In the latter case, can we get in contact with maintainers of prominent rendering software (Pango, Apple, Microsoft, ...) to gauge their interest?

ProfFan commented 8 years ago

Just a little suggestion, can this be implemented using ligatures? An example is https://github.com/tonsky/FiraCode

mpacer commented 8 years ago

It would likely require changes in text rendering software. At the least to map the characters to the value.

Also there's something a little wierd about a superscript combining character if you start thinking of it generatively. Because what you're creating is itself a combining character, which would traditionally have a scope on the previous character. But now if you wanted to have both a superscript and subscript on the same fullscript character you need some way to define the possibility of it going two spaces back.

If you change the meaning of subscript/superscript to be a category of letters you still have a problem. Do you want one to ever come before the other when setting some type? Or do they combine directly above and below each other regardless of order. What then happens when you have more than one letter you want to be super or subscripted in a sequence? Should this also cover that case? Then how does it know how to space then.

The general problem this is running up against is the expressivity of mathematical type conventions in terms of indicating referential scope on operators. Unicode is a huge code space but there will be no way to encode all that you would want in the semantics of this combining character.

However if you want to specify a use case, my guess is the way to define it is as a modification of the way diacritics combine for handling the superscript subscript thing (in that case they'll always be one on top of the other) but then something like a normal character when there are more than one in a row. That way you could emulate 1^{st}, for example. However this introduces the problem of kerning and using s smaller font size rather than just a scaled larger font size.

The simpler thing to do would be to just stack them like with diacritics but that will look terrible for superscripts since it's like trying to have an acuté, grave, hat, and diariesis on the same character. They'll overlap and will generally look awful.

But regardless it's either going to require making strong constraints on interpretation or you'll be using Unicode's flat name space to effectively implement a mathematical typesetting language like LaTεχ.

As for font issues. Look at guthub's monospace rendering of subscript schwa(ₔ vs. ₔ). Not all fonts know how to handle the current problem, even generally good ones have difficulty supporting all the features of Unicode semantics now. It's unclear how fonts are to encode an entire secondary font file in terms of themselves. The way out of that is to have some major changes to the text renderers to look for more than one font file, otherwise the spacing is likely to be terrible.

stevengj commented 8 years ago

@michaelpacer, I was thinking a[super] = ᵃ, regardless of the what comes before it, so if you did Aa[super]x[sub] you would get Aᵃₓ ... i.e. it wouldn't try to put subscripts below superscripts.

So the combining character would only affect the glyph immediately preceding it, and would be a alternative to defining lots of new subscript/superscript codepoints, not a more generalized typesetting system.

stevengj commented 8 years ago

The point is to look at the superscript/subscript characters already in Unicode, and to follow that model. It's only a question of how to encode them (as new characters or via a combining mark).

stevengj commented 8 years ago

@michaelpacer (responding to https://github.com/stevengj/subsuper-proposal/issues/3#issuecomment-243519458) if you have a generic combining character, couldn't a font encode all of the most important cases (e.g. Latin and Greek subscripts/superscripts), similar to ligature substitution? If it does a mediocre job at rendering 🐨 subscripts, I don't think it's a big deal.

This way you get the benefit of good sub/superscript glyphs in the important cases, combined with the flexibility to add more good sub/superscript glyphs as the need arises without changing the Unicode encoding.

asmeurer commented 8 years ago

Another question here: in which cases should subscript combined characters normalize to existing subscript characters?

stevengj commented 8 years ago

@asmeurer, yes, I wasn't sure about that. Unicode tends to favor using different codepoints for semantically distinct concepts, so a "mathematical subscript/superscript" character would tend to have a different codepoint from a subscript/superscript used for phonetic symbols. So, from this perspective one would have to go though the existing sub/superscripts in Unicode and identify their semantic origins.

Alternatively, a simple answer would be "all of them". (Or "none of them".)

stevengj / subsuper-proposal

Feasibility of new combining characters? #1