w3c / xml-entities

Source for http://www.w3.org/2003/entities/2007xml/
15 stars 11 forks source link

mathlatex unicode-math name in XML differing from the source TeX data? #11

Closed Omikhleia closed 1 week ago

Omikhleia commented 1 week ago

Greetings,

The unicode-math LaTeX package table has greek characters (U+00391 and subsequent) as \mupAlpha etc.

However, the Unicode XML file in this repository has them as <mathlatex set="unicode-math">\upAlpha</mathlatex> etc. with the initial "m" dropped (whereas other mappings seem correct at a glance[^1])

Could this be a weird conversion/import artifact?

Thanks in advance for any feedback! (For the context behind my question: I'm considering using this "aggregated" unicode.xml file to generate some extended tables (MathML operator dictionary, LaTeX-like aliases, etc.) for the SILE typesetting system, rather than going through all the re-assembling of the corresponding TeX, W3C etc. source files -- which is apparently what was done here).

[^1]: For instance \mitAlpha (U1D6E2) etc. have the expected names.

davidcarlisle commented 1 week ago

Hmm the import was a long time ago but I think it is (arguably) correct.

> \upAlpha=macro:
->\symup \Alpha .
l.7 \show\upAlpha

? 
> \mupAlpha=the character Α.
l.8 \show\mupAlpha

? 

from

\documentclass{article}

\usepackage{unicode-math}

\begin{document}

\show\upAlpha
\show\mupAlpha

\end{document}

show unicode-math defines both (with technically different definitions)

But the other alphabets are all using commands that map directly to the Unicode slot eg

\mbfAlpha=the character 𝚨

so this does seem strange.

The initial inclusion of this set was via some perl (I think) that massaged @wspr's tex file, so the difference is a bit surprising. I wonder if that has changed, I'll see if I can see anything in the logs.

... Ah this changed in unicode-math in in 2015 (so "recently" compared to the age of this data:-)

Author: Will Robertson <...>
Date:   Tue Jun 30 22:14:46 2015 +0930

    greek symbols: \upbeta and \itbeta are defined to be always upright and italic respectively

diff --git a/unicode-math-table.tex b/unicode-math-table.tex
index 5d3f263..acef891 100644
--- a/unicode-math-table.tex
+++ b/unicode-math-table.tex
-\UnicodeMathSymbol{"00391}{\upAlpha                  }{\mathalpha}{capital alpha, greek}%
....
+\UnicodeMathSymbol{"00391}{\mupAlpha                 }{\mathalpha}{capital alpha, greek}%

So... On the face of it I'm not against the proposal to re-sync with the unicode-math table Let me check and report back.

rather than going through all the re-assembling of the corresponding TeX, W3C etc. source files -- which is apparently what was done here).

yes well in some cases it was a bit easier for me as the direction was reversed, unicode.xml is the source of the mathml operator dictionary, the mathml and html entity list etc,, but tracking every unicode release since 1999 or so has proved interesting at times:-)

davidcarlisle commented 1 week ago

Actually the import code is more or less still in this repository in xslt

<xsl:variable name="umt">
 <umd>
  <xsl:for-each select="tokenize(unparsed-text('https://raw.githubusercontent.com/wspr/unicode-math/master/unicode-math-table.tex'),
            '[&#10;&#13;]+')[contains(.,'UnicodeMathSymbol')]">
   <xsl:variable name="id" select="replace(.,'.UnicodeMathSymbol\{&quot;([0-9A-F]+)\}.*','$1')"/>
   <character id="U{if(string-length($id)=4) then '0' else ''}{$id}">
    <xsl:value-of select="replace(.,'.UnicodeMathSymbol\{&quot;([0-9A-F]+)\}\{(\\[^{} ]+).*','$2')"/>
   </character>
  </xsl:for-each>
 </umd>
</xsl:variable>

I'll adjust to run on a current copy of unicode-math and see how much changes....

davidcarlisle commented 1 week ago

@Omikhleia I just updated the script to use the current unicode-math repo and it would make the following diff I'll probably update unicode-math later but want to think about it first, but is this closer to what you expected?

2164a2165
>          <mathlatex set="unicode-math">\mathsection</mathlatex>
2518a2520
>          <mathlatex set="unicode-math">\mathparagraph</mathlatex>
7768a7771
>          <mathlatex set="unicode-math">\mathunderbar</mathlatex>
7910a7914
>          <mathlatex set="unicode-math">\underleftrightarrow</mathlatex>
8218c8222
<          <mathlatex set="unicode-math">\upAlpha</mathlatex>
---
>          <mathlatex set="unicode-math">\mupAlpha</mathlatex>
8238c8242
<          <mathlatex set="unicode-math">\upBeta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupBeta</mathlatex>
8258c8262
<          <mathlatex set="unicode-math">\upGamma</mathlatex>
---
>          <mathlatex set="unicode-math">\mupGamma</mathlatex>
8292c8296
<          <mathlatex set="unicode-math">\upDelta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupDelta</mathlatex>
8326c8330
<          <mathlatex set="unicode-math">\upEpsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mupEpsilon</mathlatex>
8346c8350
<          <mathlatex set="unicode-math">\upZeta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupZeta</mathlatex>
8366c8370
<          <mathlatex set="unicode-math">\upEta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupEta</mathlatex>
8386c8390
<          <mathlatex set="unicode-math">\upTheta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupTheta</mathlatex>
8420c8424
<          <mathlatex set="unicode-math">\upIota</mathlatex>
---
>          <mathlatex set="unicode-math">\mupIota</mathlatex>
8440c8444
<          <mathlatex set="unicode-math">\upKappa</mathlatex>
---
>          <mathlatex set="unicode-math">\mupKappa</mathlatex>
8460c8464
<          <mathlatex set="unicode-math">\upLambda</mathlatex>
---
>          <mathlatex set="unicode-math">\mupLambda</mathlatex>
8494c8498
<          <mathlatex set="unicode-math">\upMu</mathlatex>
---
>          <mathlatex set="unicode-math">\mupMu</mathlatex>
8514c8518
<          <mathlatex set="unicode-math">\upNu</mathlatex>
---
>          <mathlatex set="unicode-math">\mupNu</mathlatex>
8534c8538
<          <mathlatex set="unicode-math">\upXi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupXi</mathlatex>
8568c8572
<          <mathlatex set="unicode-math">\upOmicron</mathlatex>
---
>          <mathlatex set="unicode-math">\mupOmicron</mathlatex>
8588c8592
<          <mathlatex set="unicode-math">\upPi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupPi</mathlatex>
8622c8626
<          <mathlatex set="unicode-math">\upRho</mathlatex>
---
>          <mathlatex set="unicode-math">\mupRho</mathlatex>
8642c8646
<          <mathlatex set="unicode-math">\upSigma</mathlatex>
---
>          <mathlatex set="unicode-math">\mupSigma</mathlatex>
8676c8680
<          <mathlatex set="unicode-math">\upTau</mathlatex>
---
>          <mathlatex set="unicode-math">\mupTau</mathlatex>
8696c8700
<          <mathlatex set="unicode-math">\upUpsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mupUpsilon</mathlatex>
8716c8720
<          <mathlatex set="unicode-math">\upPhi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupPhi</mathlatex>
8750c8754
<          <mathlatex set="unicode-math">\upChi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupChi</mathlatex>
8770c8774
<          <mathlatex set="unicode-math">\upPsi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupPsi</mathlatex>
8804c8808
<          <mathlatex set="unicode-math">\upOmega</mathlatex>
---
>          <mathlatex set="unicode-math">\mupOmega</mathlatex>
8922c8926
<          <mathlatex set="unicode-math">\upalpha</mathlatex>
---
>          <mathlatex set="unicode-math">\mupalpha</mathlatex>
8956c8960
<          <mathlatex set="unicode-math">\upbeta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupbeta</mathlatex>
8990c8994
<          <mathlatex set="unicode-math">\upgamma</mathlatex>
---
>          <mathlatex set="unicode-math">\mupgamma</mathlatex>
9024c9028
<          <mathlatex set="unicode-math">\updelta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupdelta</mathlatex>
9058c9062
<          <mathlatex set="unicode-math">\upepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarepsilon</mathlatex>
9090c9094
<          <mathlatex set="unicode-math">\upzeta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupzeta</mathlatex>
9124c9128
<          <mathlatex set="unicode-math">\upeta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupeta</mathlatex>
9159c9163
<          <mathlatex set="unicode-math">\uptheta</mathlatex>
---
>          <mathlatex set="unicode-math">\muptheta</mathlatex>
9193c9197
<          <mathlatex set="unicode-math">\upiota</mathlatex>
---
>          <mathlatex set="unicode-math">\mupiota</mathlatex>
9227c9231
<          <mathlatex set="unicode-math">\upkappa</mathlatex>
---
>          <mathlatex set="unicode-math">\mupkappa</mathlatex>
9261c9265
<          <mathlatex set="unicode-math">\uplambda</mathlatex>
---
>          <mathlatex set="unicode-math">\muplambda</mathlatex>
9295c9299
<          <mathlatex set="unicode-math">\upmu</mathlatex>
---
>          <mathlatex set="unicode-math">\mupmu</mathlatex>
9329c9333
<          <mathlatex set="unicode-math">\upnu</mathlatex>
---
>          <mathlatex set="unicode-math">\mupnu</mathlatex>
9363c9367
<          <mathlatex set="unicode-math">\upxi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupxi</mathlatex>
9397c9401
<          <mathlatex set="unicode-math">\upomicron</mathlatex>
---
>          <mathlatex set="unicode-math">\mupomicron</mathlatex>
9419c9423
<          <mathlatex set="unicode-math">\uppi</mathlatex>
---
>          <mathlatex set="unicode-math">\muppi</mathlatex>
9453c9457
<          <mathlatex set="unicode-math">\uprho</mathlatex>
---
>          <mathlatex set="unicode-math">\muprho</mathlatex>
9487c9491
<          <mathlatex set="unicode-math">\upvarsigma</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarsigma</mathlatex>
9522c9526
<          <mathlatex set="unicode-math">\upsigma</mathlatex>
---
>          <mathlatex set="unicode-math">\mupsigma</mathlatex>
9556c9560
<          <mathlatex set="unicode-math">\uptau</mathlatex>
---
>          <mathlatex set="unicode-math">\muptau</mathlatex>
9590c9594
<          <mathlatex set="unicode-math">\upupsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mupupsilon</mathlatex>
9625c9629
<          <mathlatex set="unicode-math">\upvarphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarphi</mathlatex>
9656c9660
<          <mathlatex set="unicode-math">\upchi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupchi</mathlatex>
9690c9694
<          <mathlatex set="unicode-math">\uppsi</mathlatex>
---
>          <mathlatex set="unicode-math">\muppsi</mathlatex>
9724c9728
<          <mathlatex set="unicode-math">\upomega</mathlatex>
---
>          <mathlatex set="unicode-math">\mupomega</mathlatex>
9827c9831
<          <mathlatex set="unicode-math">\upvartheta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvartheta</mathlatex>
9892c9896
<          <mathlatex set="unicode-math">\upphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupphi</mathlatex>
9926c9930
<          <mathlatex set="unicode-math">\upvarpi</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarpi</mathlatex>
10120c10124
<          <mathlatex set="unicode-math">\upvarkappa</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarkappa</mathlatex>
10147c10151
<          <mathlatex set="unicode-math">\upvarrho</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarrho</mathlatex>
10182c10186
<          <mathlatex set="unicode-math">\upvarTheta</mathlatex>
---
>          <mathlatex set="unicode-math">\mupvarTheta</mathlatex>
10196c10200
<          <mathlatex set="unicode-math">\upvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mupepsilon</mathlatex>
36909a36914
>          <mathlatex set="unicode-math">\mathhyphen</mathlatex>
38214c38219
<          <mathlatex set="unicode-math">\vec</mathlatex>
---
>          <mathlatex set="unicode-math">\overrightarrow</mathlatex>
69791d69795
<          <mathlatex set="unicode-math">\lbrbrak</mathlatex>
69797d69800
<          <mathlatex set="unicode-math">\rbrbrak</mathlatex>
69812d69814
<          <mathlatex set="unicode-math">\Lbrbrak</mathlatex>
69822d69823
<          <mathlatex set="unicode-math">\Rbrbrak</mathlatex>
166962c166963
<          <mathlatex set="unicode-math">\mbfepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfvarepsilon</mathlatex>
167296c167297
<          <mathlatex set="unicode-math">\mbfvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfepsilon</mathlatex>
167602c167603
<          <mathlatex set="unicode-math">\mitepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mitvarepsilon</mathlatex>
167721c167722
<          <mathlatex set="unicode-math">\mitphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mitvarphi</mathlatex>
167756c167757
<          <mathlatex set="unicode-math">\mitvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mitepsilon</mathlatex>
167777c167778
<          <mathlatex set="unicode-math">\mitvarphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mitphi</mathlatex>
168008c168009
<          <mathlatex set="unicode-math">\mbfitepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitvarepsilon</mathlatex>
168127c168128
<          <mathlatex set="unicode-math">\mbfitphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitvarphi</mathlatex>
168162c168163
<          <mathlatex set="unicode-math">\mbfitvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitepsilon</mathlatex>
168183c168184
<          <mathlatex set="unicode-math">\mbfitvarphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitphi</mathlatex>
168414c168415
<          <mathlatex set="unicode-math">\mbfsansepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfsansvarepsilon</mathlatex>
168533c168534
<          <mathlatex set="unicode-math">\mbfsansphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfsansvarphi</mathlatex>
168568c168569
<          <mathlatex set="unicode-math">\mbfsansvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfsansepsilon</mathlatex>
168589c168590
<          <mathlatex set="unicode-math">\mbfsansvarphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfsansphi</mathlatex>
168820c168821
<          <mathlatex set="unicode-math">\mbfitsansepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitsansvarepsilon</mathlatex>
168939c168940
<          <mathlatex set="unicode-math">\mbfitsansphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitsansvarphi</mathlatex>
168974c168975
<          <mathlatex set="unicode-math">\mbfitsansvarepsilon</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitsansepsilon</mathlatex>
168995c168996
<          <mathlatex set="unicode-math">\mbfitsansvarphi</mathlatex>
---
>          <mathlatex set="unicode-math">\mbfitsansphi</mathlatex>
Omikhleia commented 1 week ago

Thanks a lot for the quick reply!

but is this closer to what you expected?

Yes, it does sound logical to me. Besides my original concern, I just notice the duplicates now, such as \lbrbrak (on both U02772 and U03014 in the original unicode.xml) -- It would now be left on U02772 only, as in the latest/current unicode-math (which doesn't map U03014 at all).

Likewise on the mbfsansepsilon / mbfsansvarepsilon (etc.) change revealed from your diff -- If I checked correctly, these characters were visibly inverted -- vs. the "original" TeX definition \epsilon (lunate) and \varepsilon (uncial) --, so the change here seems logical too, in line with the expectations.

I don't know how stable are those unicode-math mappings now, however -- and whether people actually use all these "direct" commands in real life.

But I'm real glad this unicode.xml exists, as a way for SILE to extract from one source both the TeX-like symbol (for that syntax) and the operator default properties (stretchy, largeop etc.) for its MathML engine ^^

davidcarlisle commented 1 week ago

On Tue, 12 Nov 2024 at 18:32, Omikhleia @.***> wrote:

Thanks a lot for the quick reply!

but is this closer to what you expected?

Yes, it does sound logical to me. Besides my original concern, I just notice the duplicates now, such as \lbrbrak (on both U02772 and U03014 in the original unicode.xml) -- It would now be left on U02772 only, as in the latest/current unicode-math (which doesn't map U03014 at all).

The U+3014 is CJK punctuation which got deprecated for math use after the big STIX math update (Unicode 3.2) the current public version with the same \lbrbrak name pointing at both slots is clearly an error. So this update will fix more than planned...

Likewise on the mbfsansepsilon / mbfsansvarepsilon (etc.) change revealed

from your diff -- If I checked correctly, these characters were visibly inverted -- vs. the "original" TeX definition \epsilon (lunate) and \varepsilon (uncial) --, so the change here seems logical too, in line with the expectations.

I don't know how stable are those unicode-math mappings now, however -- and whether people actually use all these "direct" commands in real life.

They probably don't get used much as betweeen using \mathsf{\epsilon} and the direct use of the character, there isn't a lot of call to use the individual command versions but they do form a useful reference point and anyway if I list them in the file they should at least be correct.

But I'm real glad this unicode.xml exists, as a way for SILE to extract from one source both the TeX-like symbol (for that syntax) and the operator default properties (stretchy, largeop etc.) for its MathML engine ^^

thanks, that's the idea

Message ID: @.***>

davidcarlisle commented 1 week ago

@Omikhleia Hopefully the above commit fixes things, I'll close but feel free to open a new issue if you spot anything else.