stevengj / subsuper-proposal

Draft proposal for additional sub/superscript characters in Unicode
217 stars 9 forks source link

Endorsements #3

Open stevengj opened 8 years ago

stevengj commented 8 years ago

It would be good to get endorsements of the final proposal by prominent individuals, organizations, free/open-source projects, and corporations (or at least corporate representatives), to help ensure that this is taken seriously by the Unicode Consortium.

They could be listed as co-authors, or we could have a separate section for "endorsers" of the proposal.

(Note that we don't want to turn this into an online petition; I think it will have greater impact if we limit ourselves to widely recognizable entities or prominent representatives thereof.)

cc @StefanKarpinski, @Carreau, @fperez

fperez commented 8 years ago

Context: this is originally motivated by Julia and the discussion started in the IPython repo, but now @stevengj has made a proper repo here for further work.

Pinging @pkra from MathJax: Peter, what do you think of this idea?

Thinking out loud from Jupyter's perspective (though not saying anything "official" yet :), I think we're mostly agnostic: I like the idea, but we're basically a pass-through for the code written in any language. We'd be most directly impacted by the editing needs and platform/browser support for the standard, but that tends to get sorted out relatively quickly if these things are accepted. So I don't think our opinion matters too much, though I love the idea :)

I think more important than Jupyter would be to hear from core Python folks: I know that while Python (3) accepts unicode identifiers, I not all unicode chars are allowed. For this to impact Python in a positive way, we'd need this new class of characters to be allowed as identifiers. In that regard, I don't know if the choice between your two proposed paths (new chars vs. combining) would matter to Python's choice of what is allowed to be a variable name.

I know @takluyver is fairly up to speed with these things in the Python world, perhaps you can comment?

stevengj commented 8 years ago

@fperez, most of the proposed new characters would be new Latin and Greek subscripts/superscripts, which would be in category Lm (Letter, modifier), and are accepted in Python 3 identifiers already (e.g. αₓ is already allowed in Python). So, new characters in the same category would presumably be allowed as Python identifiers.

Something like a subscript or * would probably be in category Sm (Symbol, math), and these are not accepted as Python identifiers at the moment, so I doubt the subscript version will.

If we decide to go the combining-character route, that should be fine too, since Python 3 accepts combining marks in identifiers (e.g. for ).

(I really think that Python should expand the set of Unicode categories that it accepts as identifiers. It's crazy to me that x0 is a valid identifier but x₀ is not; is in category No "Number, other", and I would think that Python would treat this like any other number for identifiers.)

fperez commented 8 years ago

On Sat, Aug 27, 2016 at 5:04 AM, Steven G. Johnson <notifications@github.com

wrote:

(I really think that Python should expand the set of Unicode categories that it accepts as identifiers. It's crazy to me that x0 is a valid identifier but x₀ is not; ₀ is in category No "Number, other", and I would think that Python would treat this like any other number.)

Agreed. The rule could be "must start with a character from <narrower set A>, but afterwards can include ". Python already forbids identifiers starting with numbers, so this would still restrict a bare variable named "" (which would make an horror like "* == * == ***2" possible :). But it would allow those you suggest...

If this gains traction, we could try to work with the Python team on the question, they are receptive to discussions driven by concrete use cases.

Carreau commented 8 years ago

We'd be most directly impacted by the editing needs and platform/browser support for the standard, but that tends to get sorted out relatively quickly if these things are accepted.

Chome still regularly render incorrectly the combining arrow of a vector on the next character instead of previous. The issue has been open for a year at least now.

Valid Python identifier

Object repr is a perfectly valid example where we (IPython) could make use of that without the need to be an identifier, but I agree it's beyond the scope of the proposal.

Otherwise I would thought the mathematical sup/subscript to be before the character they modify, more the a ZWJ, to be in between the 2 glyphs, but I don't know the standard in unicode.

What to you expect if you have <Caracter><a superscript><a subscript> should both be above each other ? if so should <Caracter><a subscript><a superscript> be normalized the same?

asmeurer commented 8 years ago

Maybe a font could chose to render them on top of each other with <combining subscript>a<ZWJ><combining superscript>b.

asmeurer commented 8 years ago

At any rate, that's the second time someone has implicitly assumed that this proposal includes support for superscript and subscript characters on top of each other, so this should be discussed in the proposal, even if we don't want to propose allowing that at all.

pkra commented 8 years ago

Pinging @pkra from MathJax: Peter, what do you think of this idea?

Thanks for cc'ing me @fperez. I'm not an expert on Unicode so I don't have much I have to say on this. There's an inherent tension between doing layout via Unicode (i.e., font rendering engines) when in the context of other layout engines (TeX, HTML+CSS, SVG etc). Combining characters are somewhat of a pain when you're doing layout (e.g., in HTML, split them into two spans -- what should happen?). They also pose an accessibility problem since assistive technologies often ignore non-ascii Unicode characters (especially with default settings) and since Unicode names have no official localization.

If the hope is to magically solve a layout problem, then I'd be skeptical. Something has to do the layout after all, and you only push this to the level of the font engines (which vary widely in quality across OSs and OS versions and even applications on the same OS, e.g., Windows ships several font engines).

The only proper opinion I have is: I would drop the "mathematical" part as there seems to be nothing mathematical about it -- it's just scripts.

stevengj commented 8 years ago

@Carreau, a modifier character in Unicode always comes after the character to be modified. e.g. to type you do x followed by U+0302.

@pkra, combining characters are not magic. Every modern editor, terminal, and browser already supports them. (Specific combining characters might not appear in certain settings, but that's a font problem.) And essentially all modern programming languages already accept non-Unicode identifiers, so that ship has sailed. The current situation is that you can make identifiers with some subscripts, e.g. αᵦ is already allowed in Python 3, but it only works with an arbitrary subset of Latin and Greek characters that have codepoints assigned. (e.g. you can do every superscript lower-case Latin letter except q.)

At the very least, for using super/subscripts in mathematical code, we should complete the set with the remaining Latin and Greek characters. (This could be done without any combining characters, just by adding new codepoints; the combining-character proposal is a more ambitious alternative.)

pkra commented 8 years ago

@pkra, Every modern editor, terminal, and browser already supports them.

Thanks. I'm aware of that. What I was trying to point out was that problems linger outside the sphere of Unicode.

To give an example, recently a publisher approached me about rendering issues with MathJax. Their content was trying to get a V-bar (V̅). Their source was MathML and was using combining characters for this. Unfortunately, that's not what MathML expects -- it has an <mover> construct for this -- and the rendering fell apart in MathJax (and also, for a different reason, in their PDF rendering).

So again, from the perspective of rendering beyond the scope of Unicode constructs, combining characters are already messy and I don't think adding to them helps these use cases.

I realize that this has little bearing on the proposal at hand. I only wanted to respond to @fperez question for comment from my arguably limited perspective.

stevengj commented 8 years ago

@pkra, I agree that combining Unicode-based math formatting with MathML or LaTeX is a recipe for trouble. But, as you say, this has little bearing on the current proposal, which is mainly aimed at programming languages (in which non-Unicode formatting is a non-starter).

asmeurer commented 8 years ago

Are there typographical considerations where direct font support could do better than a naive renderer?

mpacer commented 8 years ago

Almost all kerning and (more generally) letter-placement considerations will benefit from particular treatment at the font level. The only renderers I've seen (occaisionally) do better than the default font kerning are Adobe's Illustrator and InDesign renderers (which are also designed to be highly customizable e.g., with ⌥+→ expanding the placement of two characters slightly)

For a different project, I was trying to actually look up how browsers were extracting and rendering these kinds of placement considerations (since their solutions are demonstrably different than from the font default as rendered by Illustrator)… I ended up getting no where with that project. However, I'd be curious to see how browsers are handling subscripts and whether that information is currently being baked into the font, I'm assuming that it is.

Figuring out how to automatically decide that for a generic combining character has a high potential to be a mess. It's not impossible, but I think naïve rendering is not going to go well in the most general case.

On Tue, Aug 30, 2016 at 10:21 AM, Aaron Meurer notifications@github.com wrote:

Are there typographical considerations where direct font support could do better than a naive renderer?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stevengj/subsuper-proposal/issues/3#issuecomment-243513681, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXg6N2cbTvGcbq_e-Cndl_BFc_bM2P9ks5qlGaSgaJpZM4JumHr .

stevengj commented 8 years ago

Please put discussion of the technical implementation of combining characters in issue #1.

lambdafu commented 7 years ago

Maybe ask CERN for endorsement?

NAThompson commented 4 years ago

I can endorse and maybe with some effort can get institutional backing. This would be great for communicating the results of the PSLQ algorithm.

stevengj commented 4 years ago

Thanks, I should really get back to this proposal. What institution were you thinking of?

NAThompson commented 4 years ago

ORNL.