w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
58 stars 18 forks source link

MathML 4 extensions for alignment and possible deprecation of <maligngroup/> and <malignmark/> #181

Open NSoiffer opened 4 years ago

NSoiffer commented 4 years ago

Having just implemented a polyfill for elementary math, that got me thinking about some related ideas:

  1. The most obvious concept related to long division is synthetic division. It is basically the same idea as long division except that you are dividing polynomials. With synthetic division, the columns contain numbers (the coefficients of the polynomial), not just digits. As a refresher, see this page and the example taken from it below):
Polynomial Division Synthetic Division
image image
  1. Synthetic division is a shorthand for long division of polynomials (left example above). Long division of polynomials is basically the same idea as long division of numbers except that instead of digits, you have monomials that need to go into their own column. Doing that automatically requires knowing the variable you want to "sort" on so that each monomial goes into the proper column.
  2. A very similar property is needed when displaying systems of equations -- each monomial wants to be in it's own column (in this case, the top level element would not be mlongdiv, but mstack).

There are a few complications such as decimal alignment of the coefficients:

      8.44x + 55  y =  0
      3.1 x -  0.7y = -1.1

Note that alignment requires knowing what characters/operators act as column separators (e.g., +and -, along with = and a few other relational operators). These would be inside of mo elements, so potentially any mo element could be a separator, or maybe an attribute specifies what the separators are (something to think about/discuss).

The above example is taken from the MathML 3 spec formaligngroup and malignmark. I think only MathPlayer ever implemented those elements and I suspect that you can count on your fingers the number of times they have been used. It is a very complicated feature to implement and to use. In contrast, I think the above features are an incremental extension to elementary math layout, so implementation (especially via an extension to the polyfill I wrote), means that supporting these features would be universal (assuming I or someone else extended the polyfill). Just as important, using this extension would be simple as it is a declarative notation that doesn't require modifying the generated layout other than at a high level (wrapping with mstack or mlongdiv). It would be less powerful though.

I suspect that this proposed extension to elementary math handles the large majority of cases where people play games with tables to achieve alignment, both in MathML and in TeX. @davidcarlisle: do you have any estimate of how many uses of table for alignment in TeX would be covered by this proposed extension? What are some of the cases that are missed by it?

dginev commented 3 years ago

Hello. I was looking for an appropriate issue to attach a recent piece of news I spotted, and since this issue discusses malignmark, it seems appropriate. There is a recent post about bypassing the sanitization of DOMPurify through an abuse of parsing MathML in HTML, details here: https://portswigger.net/daily-swig/dompurify-mutation-xss-bypass-achieved-through-mathml-namespace-confusion

Summarized as:

In the MathML namespace, two special elements – mglyph and malignmark – allow the creation of a markup that is “in HTML namespace, but on reparsing it is in MathML namespace, [meaning that] the subsequent style tag [is] parsed differently and leading to XSS,” the researcher explained.

This might be relevant if you're searching for additional reasons for deprecation.

NSoiffer commented 3 years ago

Kind of weird that it is mglyph and malignmark and not maligngroup. I read the link and there was no hint as to why those two and not maligngroup or any other element with empty content.

On Fri, Oct 9, 2020 at 4:02 PM Deyan Ginev notifications@github.com wrote:

Hello. I was looking for an appropriate issue to attach a recent piece of news I spotted, and since this issue discusses malignmark, it seems appropriate. There is a recent post about bypassing the sanitization of DOMPurify through an abuse of parsing MathML in HTML, details here:

https://portswigger.net/daily-swig/dompurify-mutation-xss-bypass-achieved-through-mathml-namespace-confusion

Summarized as:

In the MathML namespace, two special elements – mglyph and malignmark – allow the creation of a markup that is “in HTML namespace, but on reparsing it is in MathML namespace, [meaning that] the subsequent style tag [is] parsed differently and leading to XSS,” the researcher explained.

This might be relevant if you're searching for additional reasons for deprecation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/181#issuecomment-706436798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3HME3RMGCKAY6HMDXTSJ6JBXANCNFSM4KCGBEGA .

davidcarlisle commented 3 years ago

On Thu, 15 Oct 2020 at 21:41, NSoiffer notifications@github.com wrote:

Kind of weird that it is mglyph and malignmark and not maligngroup. I read the link and there was no hint as to why those two and not maligngroup or any other element with empty content.

maligngroup is only used in mathml containers like mrow so is not a problem.

mglyph is allowed in token elements and in a mathml-in-html elements token elements are the entry point for html so most elements inside an are html elements except for those two which stay in the mathml namespace.

On Fri, Oct 9, 2020 at 4:02 PM Deyan Ginev notifications@github.com wrote:

Hello. I was looking for an appropriate issue to attach a recent piece of news I spotted, and since this issue discusses malignmark, it seems appropriate. There is a recent post about bypassing the sanitization of DOMPurify through an abuse of parsing MathML in HTML, details here:

https://portswigger.net/daily-swig/dompurify-mutation-xss-bypass-achieved-through-mathml-namespace-confusion

Summarized as:

In the MathML namespace, two special elements – mglyph and malignmark – allow the creation of a markup that is “in HTML namespace, but on reparsing it is in MathML namespace, [meaning that] the subsequent style tag [is] parsed differently and leading to XSS,” the researcher explained.

This might be relevant if you're searching for additional reasons for deprecation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/mathml-refresh/mathml/issues/181#issuecomment-706436798 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AALZM3HME3RMGCKAY6HMDXTSJ6JBXANCNFSM4KCGBEGA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mathml-refresh/mathml/issues/181#issuecomment-709578549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVYAXNEM475TJ2U7M7XXDSK5M6FANCNFSM4KCGBEGA .

NSoiffer commented 3 years ago

Thanks for the explanation -- that makes sense.

davidcarlisle commented 1 year ago

the schema has been updated to restrict use of malignmark, and to remove grouplaign attribute except in legacy schema

https://github.com/mathml-refresh/mathml-schema/commit/4e897dc8f3925d7e0dbaedcb06fa10417e2ee3c4