w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
63 stars 19 forks source link

Make MathML attributes ASCII case-insensitive #178

Open fred-wang opened 4 years ago

fred-wang commented 4 years ago

This is a follow-up of #22 ; we decided to follow HTML/CSS which treat things as ASCII case-insensitive. Concretely, ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt has this line:

017F; C; 0073; # LATIN SMALL LETTER LONG S

which means that falſe is case-insensitively equal to false. However, it is not ASCII case-insensitively equal to false (only a-z <-> A-Z equivalence are considered in that case).

Currently, the MathML Core spec just says "case-insensitive".

Note: for CSS colors, I reported https://github.com/w3c/csswg-drafts/issues/4599

fred-wang commented 4 years ago

I did a quick check and for the MathML-specific definitions, I only see case-insensitive against strings with ASCII letters and dashes. So the only difference would be for "LATIN SMALL LETTER LONG S", "KELVIN SIGN" and maybe a few "LATIN SMALL LIGATURE" (e.g. double-STruck). Unlikely for "LATIN CAPITAL LETTER I WITH DOT ABOVE" if the Turkish rule is used. See https://github.com/w3c/csswg-drafts/issues/4599#issuecomment-565794132

NSoiffer commented 4 years ago

That's a good catch. I'm pretty sure we all agree that we only mean ASCII case-insensitivity. I suggest we add the following to the spec, which is a slight rewording from the HTML spec:

Many strings in the HTML and CSS syntax (e.g. the names of elements and their attributes) are case-insensitive, but only for ASCII upper alphas and ASCII lower alphas. For convenience, in this specification this is just referred to as "case-insensitive".

I suggest this goes into Appendix G.1: Document Conventions.

fred-wang commented 4 years ago

I would prefer to be explicit everywhere and use "ASCII case-insensitive" with a link to https://infra.spec.whatwg.org/#ascii-case-insensitive ; this seems to be what the HTML and CSS specifications do (or how they would be fixed it e.g. https://github.com/w3c/csswg-drafts/issues/4599#issuecomment-565599403). I'm sure if we just keep case-insensitive as it is now, people will easily not read the appendix. We should also avoid duplicating definition from HTML5 as it was mentioned in another issue.

fred-wang commented 4 years ago

Consensus from 2019/12/16: Move to ASCII case-insensitiveness

fred-wang commented 4 years ago

These are the attributes, with the behavior changes that will require tests:

Other attributes rely on CSS ( https://mathml-refresh.github.io/mathml-core/#types-for-mathml-attribute-values ) so nothing is changed here (although tests can always be added).

davidcarlisle commented 4 years ago

If we want to keep the relax schema there are two choices either we could say in words that values should be ascii-lowercased before validation or we could make the schema do the case insensitive match.

That would mean for example changing

attribute mathvariant {"normal" | "bold" | "italic" | "bold-italic" | "double-struck" | "bold-fraktur" | "script" | "bold-script" | "fraktur" | "sans-serif" | "bold-sans-serif" | "sans-serif-italic" | "sans-serif-bold-italic" | "monospace" | "initial" | "tailed" | "looped" | "stretched"}?,

to

attribute mathvariant {xsd:string{pattern="[Nn][Oo][Rr][Mm][Aa][Ll]|[Bb][Oo][Ll][Dd]|[Ii][Tt][Aa][Ll][Ii][Cc]|[Bb][Oo][Ll][Dd]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Dd][Oo][Uu][Bb][Ll][Ee]-[Ss][Tt][Rr][Uu][Cc][Kk]|[Bb][Oo][Ll][Dd]-[Ff][Rr][Aa][Kk][Tt][Uu][Rr]|[Ss][Cc][Rr][Ii][Pp][Tt]|[Bb][Oo][Ll][Dd]-[Ss][Cc][Rr][Ii][Pp][Tt]|[Ff][Rr][Aa][Kk][Tt][Uu][Rr]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]|[Bb][Oo][Ll][Dd]-[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]-[Bb][Oo][Ll][Dd]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Mm][Oo][Nn][Oo][Ss][Pp][Aa][Cc][Ee]|[Ii][Nn][Ii][Tt][Ii][Aa][Ll]|[Tt][Aa][Ii][Ll][Ee][Dd]|[Ll][Oo][Oo][Pp][Ee][Dd]|[Ss][Tt][Rr][Ee][Tt][Cc][Hh][Ee][Dd]"}}?,

which works but isn't very human readable or informative.

Since we already need some pre-processing described in words to allow data-foo attributes (or onfoo attributes to be ignored, I'm tempted to suggest we keep the existing string match but could be persuaded otherwise....

fred-wang commented 4 years ago

I think this was already the case since #22 ; not sure how important it is for legacy XML applications. I wonder what is done for HTML5 ?

davidcarlisle commented 4 years ago

On Tue, 17 Dec 2019 at 13:07, Frédéric Wang notifications@github.com wrote:

I think this was already the case since #22 https://github.com/mathml-refresh/mathml/issues/22 ; not sure how important it is for legacy XML applications. I wonder what is done for HTML5 ?

The validator.nu html5 validator has a relaxng schema at its core but heavily preprocesses the document with custom code before validating it, so I think pre-processing is fine (and makes the schema a lot easier to read)

ByteEater-pl commented 4 years ago

legacy XML applications

Could you, please, define this term, @fred-wang? I don't know which XML applications are legacy and which aren't.

fred-wang commented 4 years ago

legacy XML applications

Could you, please, define this term, @fred-wang? I don't know which XML applications are legacy and which aren't.

I believe I was talking about XML-based MathML3 implementations.

fred-wang commented 4 years ago

Removing "tests" label, we have tests for mathsize and dir. It's not exhaustive, but HTML or CSS don't test exhaustively either...

Also removing core label since the only remaining changes are in mathml full