w3c / mathml-core

MathML Core draft
https://w3c.github.io/mathml-core
36 stars 14 forks source link

U+002D HYPHEN-MINUS in <mo> operators #70

Open fred-wang opened 4 years ago

fred-wang commented 4 years ago

U+002D HYPHEN-MINUS is too short, so MathML browser implementations render it as U+2212 MINUS SIGN.

Do we want to standardize this workaround?

I'd prefer not, but I guess the proper way to do it would be via a new text-transform value for mo.

If not, tools should generate the proper code point instead and we can write a polyfill for that.

davidcarlisle commented 4 years ago

the unicode name suggests that this character is usable (on input) as minus and I would guess a very large percentage of existing mathml uses - rather than U+2212 as minus (including the examples of subtraction in the MathML3 spec)

so if it does not break the code design too much I think it would be good to support this in core, although as you say a polyfill could do the replacement if that is really needed.

NSoiffer commented 4 years ago

There are a number of characters that should render the same. These are listed in chapter 7. For this case, it says "MathML renderers should treat U+002D [HYPHEN-MINUS] as equivalent to U+2212 [MINUS SIGN] in formula contexts such as mo, and as equivalent to U+2010 [HYPHEN] in text contexts such as mtext."

Some equivalents not covered in chapter 7 (and maybe something we should add to the spec) are things like - and _ being rendered the same as (stretchy) lines in mover, etc.

MurraySargent commented 4 years ago

Math instances of U+002D should be displayed as U+2212. This should be done by converting U+002D to U+2212 when reading in MathML or another file format. Similarly math instances of U+0027 (' apostrophe) should be converted to U+2032 (′ prime). The OpenType ssty feature should not be required for these changes. At least that's how it works in OfficeMath (Word, PowerPoint, OneNote, etc.)

fred-wang commented 4 years ago

I agree with Murray that people / authoring tools should use the proper glyph (so U+2212 instead of U+002D, or U+2032 instead of U+0027). The question is whether we want to handle backward compatibility for this kind of "bad markup" as clearly there are existing content doing it. My feeling is that we don't want to add this ugly hack in level-1 since the goal is to have a clean spec as a starting point. Maybe that can encourage people to migrate their pages / tools. If we do this in the future, I'd prefer to standardize this at a CSS level like text-transform.

ssty,is to handle script things at a font-level, but this is not about script and existing fonts don't provide these transforms, so it's irrelevant here. See https://github.com/mathml-refresh/mathml/issues/19 for a separate discussion.

faceless2 commented 4 years ago

While I understand the sentiment and the desire for a clean spec, I think conversion from U+002D to U+2212 in particular is fairly critical. Of the testcases we've been working from I don't think a single one uses U+2212, and before we added this substitution, the results were noticeably incorrect.

If MathML3 specified a lot of these types of substitutions I would certainly back moving it to something like text-transform. But for this single substitution (or both if the less common U+0027 to U+2032 is included as well) the pragmatic - if not the cleanest - approach is to specify this conversion takes place explicitly.

The alternative is you either define this behaviour somewhere else (ie css, or a polyfill) or have close to 100% of legacy MathML content render incorrectly.

NSoiffer commented 4 years ago

Other characters that have similar issues:

In addition, there are a number of characters that occur in under/overscripts that currently aren't specified but need to be:

NSoiffer commented 4 years ago

@fred-wang: you removed the 'need resolution' tag without specifying a resolution. What is the resolution?

fred-wang commented 4 years ago

This is very low priority, so I removed the label as I thought you wanted to use this label to prioritize what needs to be discussed in meetings. As said above, we are definitely not going to do this hack for a first implementation and so as agreed by our process this shouldn't go into mathml core level 1.

The cases mentioned on https://github.com/mathml-refresh/mathml/issues/146#issuecomment-661241874 are even less important (and separate from this bug report), no browsers do that kind of substitution so there is no backward compatibility risk. I would personally oppose doing this for any version of MathML Core.

I think the right thing to do for now is to write a polyfill for minus and to urge people to update their tools/documents to use the proper code point.

davidcarlisle commented 4 years ago

It is really baffling why you see this as low priority not supporting it means that essentially no existing mathml will work unchanged as mathml core.

It is not at all clear that U+002D is not "the proper code point" it is hyphen-minus, that is, its intended use is as a hyphen in text and a minus in math, which is how it has always been treated in MathML so far.

fred-wang commented 4 years ago

@davidcarlisle I just tried a basic testcase $$-$$ in LaTeX (hyphen-minus) and the character in the pdf output is U+2212 MINUS SIGN so hyphen is not used as a minus sign. Ideally tools generating MathML content (converter, WYSYWYG etc) should do the same. If you are talking about how people typeset the math with the keyboard, then that's not a topic for MathML Core which is focusing on browser rendering. Changing a character between DOM and rendering (and so possibly its semantics) was already controversial for text-transform / mathvariant / single-char-mi and that caused hot debates in the initial CSSWG discussion last year. Since level 1 is focusing on a clean spec, introducing another hack does not seem appropriate at all.

I understand people can disagree on what is important for MathML, but I really wish we agree on the principle followed for the development and implementation of MathML Core. I'm really disappointed that some people still seem to follow MathML3's approach "put something in the spec so that it get magically implemented in browsers" (and even worse putting pressure on others to do the job). Anyway, I'm tired of repeating the same thing again and again and I don't want to waste time arguing about this, so I'll stop here.

davidcarlisle commented 4 years ago

In classic latex you certainly wouldnt get U2212 but that is misunderstanding my comment. Even without math, unicode input is expected to go through all kinds of font shaping so the glyphs in the output don't match the input. A unicode input of - is expected to make a minus sign most likely rendered using a glyph at position 2212 if used in math. This has been supported by every MathML system so far including the one in Office, and mathjax and existing browser implementations.

Why is this different from <mi>x<mi> rendering as U+1D465

I understand people can disagree on what is important for MathML, but I really wish we agree on the principle followed for the development and implementation of MathML Core. I'm really disappointed that some people still seem to follow MathML3's approach "put something in the spec so that it get magically implemented in browsers" (and even worse putting pressure on others to do the job). Anyway, I'm tired of repeating the same thing again and again and I don't want to waste time arguing about this, so I'll stop here.

That completely mis-represents the discussion.

Using - is interoperably supported by existing mathml systems and used in the overwhelming majority of existing mathml content. It clearly meets the criterion for being included in MathML Core.

MurraySargent commented 4 years ago

I agree with David. I don’t understand why changing U+002D to U+2212 is problematic. We have to change many other characters such as ASCII letters to the corresponding math italic letters, spacing accent marks to combining marks, apostrophe to superscripted prime, etc. So changing U+002D to U+2212 just goes along for the ride.

And don’t forget changing the math-deprecated angle brackets U+2329 and U+232A to the MATHEMATICAL LEFT ANGLE BRACKET (U+27E8) and MATHEMATICAL RIGHT ANGLE BRACKET (U+27E9), respectively. The former were deprecated for math usage in Unicode 3.2 (in 2002) since they had unfortunately been made canonically equivalent to the corresponding Japanese angle brackets U+3008 and U+3009, respectively (see UTR w3c/mathml#25 Section 2.10).

But I agree that MathML writers should write U+2212 (−) instead of U+002D (-) and that LaTeX and UnicodeMath input editors should convert the HYPHEN-MINUS to U+2212. The days of pure ASCII are so last century 😊

Thanks, Murray

From: David Carlislemailto:notifications@github.com Sent: Friday, August 21, 2020 3:21 AM To: mathml-refresh/mathmlmailto:mathml@noreply.github.com Cc: Murray Sargentmailto:murrays@exchange.microsoft.com; Commentmailto:comment@noreply.github.com Subject: Re: [mathml-refresh/mathml] U+002D HYPHEN-MINUS in operators (#146)

In classic latex you certainly wouldnt get U2212 but that is misunderstanding my comment. Even without math unicode input is expected to go through al kinds of font shaping so the glyphs in the output don't match the input. A unicocde input of - is expected to make a minus sign most likely rendered using a glyph at position 2212 if used in math. This has been supported by every MathML system so far including the one in Office, and mathjax and existing browser implementations.

Why is this different from x rendering as U+1D465

I understand people can disagree on what is important for MathML, but I really wish we agree on the principle followed for the development and implementation of MathML Core. I'm really disappointed that some people still seem to follow MathML3's approach "put something in the spec so that it get magically implemented in browsers" (and even worse putting pressure on others to do the job). Anyway, I'm tired of repeating the same thing again and again and I don't want to waste time arguing about this, so I'll stop here.

That completely mis-represents the discussion.

Using - is interoperably supported by existing mathml systems and used in the overwhelming majority of existing mathml content. It clearly meets the criterion for being included in MathML Core.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmathml-refresh%2Fmathml%2Fissues%2F146%23issuecomment-678190243&data=02%7C01%7Cmurrays%40exchange.microsoft.com%7C9b75cc8e05d94a67f67808d845bc04ab%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637336021119542786&sdata=ij95rNfCOHGmLF2oQfZ%2FeeaXA8AFBPatTT%2BdO04c5qg%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALMXIVSXCRWS3NNTB7QPVLTSBZDD5ANCNFSM4IZHRDUQ&data=02%7C01%7Cmurrays%40exchange.microsoft.com%7C9b75cc8e05d94a67f67808d845bc04ab%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637336021119542786&sdata=nDvYqZ8uSXaY3LC%2FKwUt6fryijaS6%2FAgN6O78SmdPys%3D&reserved=0.

NSoiffer commented 3 years ago

I did a check of some popular TeX-to-MathML converters: tex4ht, ltlatex , latexml, and even @fred-wang's own TeXZilla produce the ASCII minus. Even if they were all changed, that leaves all the MathML that has been produced by them over the years as having the ASCII minus in the MathML.

MurraySargent commented 3 years ago

Most definitely replace U+002D by U+2212 in a math zone (inside ) unless it’s explicitly marked as . No other if ands or buts 😊 Seriously, U+002D looks simply awful as a minus sign. That’s why we encoded U+2212 for the minus sign.

Thanks, Murray

From: Frédéric Wang @.> Sent: Monday, June 28, 2021 7:50 PM To: w3c/mathml-core @.> Cc: Murray Sargent @.>; Comment @.> Subject: [w3c/mathml-core] U+002D HYPHEN-MINUS in operators (#70)

U+002D HYPHEN-MINUS is too short, so MathML browser implementations render it as U+2212 MINUS SIGN.

Do we want to standardize this workaround?

I'd prefer not, but I guess the proper way to do it would be via a new text-transform value for mo.

If not, tools should generate the proper code point instead and we can write a polyfill for that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fw3c%2Fmathml-core%2Fissues%2F70&data=04%7C01%7Cmurrays%40exchange.microsoft.com%7C7c74e1ad4da14bdf4a3208d93aa8945d%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637605318006616340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eebr8ZKmH75kC808s2lVdtf7MMOyNwISGvunH09vADo%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALMXIVSSYYD7DNLFQHOGV6TTVEYFJANCNFSM47PFD6KA&data=04%7C01%7Cmurrays%40exchange.microsoft.com%7C7c74e1ad4da14bdf4a3208d93aa8945d%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637605318006616340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HMwNwmhKvQY5b2b2hMbwkNENTGWnpoJcroDo0zCALvI%3D&reserved=0.

fred-wang commented 2 years ago

I think there are confusions in the this thread. Editors are free (and probably should) to replace any typed U+002D (-) with U+2212 (−), that's out of the scope of MathML Core. The question is about whether we want to introduce a hack (e.g. based on text-transform) in browsers with all the extra issues it opens (more exceptions in the code, more tests neeed, text mistatch between DOM / rendered / ATs / copy & paste, etc).

In any case, there is no plan to integrate such a hack in Chromium's initial implementation so I guess this should be level 2.

NSoiffer commented 2 years ago

There is no confusion in my mind -- this is a requirement for the spec, not editors. Existing MathML and existing MathML producers mostly use U+002D and expect it to render with the U+2212 glyph.

Here's the difference illustrated with a trivial codepen rendered by Chrome with math support: image

I don't think the spec should say how to implement this equivalence. It should merely say that U+002D should be rendered as U+2212. I'm not convinced that text-transform is the only way this can be done. What I am convinced is that not implementing this equivalence is a significant change from people's expectation and current use.

faceless2 commented 2 years ago

I completely agree with Neil. At the absolute least you need some sort of statement of intent - it is expected that user-agents will convert U+002D to U+2212 for both rendering and the AT tree via some undefined mechanism, even if it's marked as optional, would go some way to limiting the inevitable divergence when implementers are given existing MathML docs, and the existing MathML users that go with them, and a spec that's silent on what to do about it.

(I say inevitable with some confidence, because we've implemented this substitution).

davidcarlisle commented 2 years ago

I think (whether or not you can implement this in the first implementation) that core should say that - should render as minus, the majority of existing MathML assumes this, including all instances of subtraction in the MathML3 spec and almost all existing generators eg tex to mathml convertors. I don't think you can brush this off as "confusion" on the part of the commenters or that it is a "hack". The - symbol in Unicode is explicitly dual use HYPHEN-MINUS and should act as a hyphen in text and a minus in math.

bkardell commented 2 years ago

Discussed in the meeting today, no resolutions. We'll circle back on this next month after we have some more async discussions.

NSoiffer commented 2 years ago

Adding needs-spec change label as a meeting agreement was this must be done due to the vast amount of legacy MathML that assumes this equivalence. Both Firefox and Safari support this. If Chrome can't handle this, it should fail a test.