olsak / OpTeX

OpTeX - LuaTeX format with extended Plain TeX macros
http://petr.olsak.net/optex/
35 stars 14 forks source link

Unicode math issues #21

Closed vlasakm closed 3 years ago

vlasakm commented 3 years ago

Hello,

consider the following example:

\fontfam[lm]

$
\matheth
ð
\mathexclam
!
\showlists
$

\bye

It results into:

\mathord
.\fam0 ð
\mathord
.\fam0 ð
\mathclose
.\fam1 !
\mathord
.\fam1 !
[...]
Missing character: There is no ð (U+00F0) in font cmr10!
Missing character: There is no ð (U+00F0) in font cmr10!
Missing character: There is no ð (U+00F0) in font cmr10!

You can see the issue with missing character ð for example when printing math symbols with print-unimath.opm and \fontfam[lm] (but note that Latin Modern Math contains the glyph!). Also, interestingly I can't see the warning about missing character when doing optex '\fontfam[lm] \input print-unimath.opm \bye' (but it is missing in the PDF).

The issue with ð (U+00F0) can be fixed by using the right family (number 1):

\Umathcode`ð="0"1`ð
\Umathchardef\matheth="0"1`ð

Although ! uses the right family number, it's math class is not consistent.

The issue can be traced to unimath-codes.opm, where:

1) While loading mathclass.opm, \_global\_Umathcode#1=\_tmp\_space 1 #1 is issued and sets the right family number and math class for all codepoints. This means:

2) Next, while loading unimath-table.opm, \_global\_Umathcharnumdef#2=\_Umathcodenum#1 is issued and declares control sequences for some codepoints. This means:

First of all maybe some things could be simplified by setting the unicode math family as family 0, instead of 1 (and also 2 and 3) as it currently is. (Also maybe setting families 2 and 3 is no longer neccessary, I didn't investigate past function fixup_math_parameters in tex/mlist.c, which doesn't do anything special for unicode math families 2 and 3. But that doesn't really solve the issues.)

I don't have much experience with math typesetting, but my idea would be to roughly:

1) Load unimath-table.opm before mathclass.opm. 2) In unimath-table.opm do:

      \_global\_Umathchardef#2=0 1 #1\_relax
      \_global\_Umathcode   #1=0 1 #1\_relax

instead of

      \_global\_Umathcharnumdef#2=\_Umathcodenum#1\_relax

I.e. codepoints which are not mentioned in mathclass.opm still work using both the codepoint directly and also the control sequence, but where mathclass.opm knows the math class everything is corrected. But my code here probably doesn't work as hinted by Marcel Krüger in https://tex.stackexchange.com/questions/520611/what-are-the-advantages-of-using-umathchardef-over-umathcode-with-let. I am sorry but I didn't go further with solutions and testing.

But as far as I can see inputting unimath-table.opm is not really useful because it doesn't do the right thing, when not corrected by mathclass.opm or by OpTeX code which handles math alphabets.

olsak commented 3 years ago

This seems there were two independent problems. The ð character represents many characters they have no code in mathclass.opm but they are present in unimath-table.opm. The ! reperesents only itself. This is a character declared as Close in plain.tex and unimath-table.opm but it cannot be defined as delimiter. All others Close characters have to be declared as demilmiter (and they are delared). Moreover, ! is declared as Ord in mathclass.opm.

I added one \ifnum which solves the characters of ð type and I added exceptions for ! alias \mathexclam and ? \mathquestion characters.

Thanks for noticing.

PS. I considered alternative solution of ! in math: to set it as Punct. Then we can write (n-k)! k! without explicit spacing as mentioned in TeXbook, example at page 169: (n-k)!\,k!, but more problems will occur: n!=x or n!+k! will have incorrect spacing. So, my last decision is to keep the plain TeX compatibility and set the ! and ? characters as Close. It seems somewhat obscure but better solution (IMHO) does not exist.

vlasakm commented 3 years ago

Thank you for looking into this!

Yes, I am sorry, but I forgot to explicitly mention, that it is not only about ð, but also about other characters in similiar situation.

The solution and its implications seem fine to me.

This should maybe be another issue, but another problem I found is in makeindex.opm, where there is $\lq65$ written in documentation. This also fails because of using family 0 for character `. But I believe that the intention was to use < (or rather \lt in this context).

Also it seems that this snippet from \_setprimarysorting starts assigning lc codes from 65:

   \_def \_act ##1{\_ifx##1\_relax \_else
      \_ifx##1,\_advance\_tmpnum by1
      \_else \_lccode`##1=\_tmpnum \_fi
      \_ea\_act \_fi}%
   \_tmpnum=65 \_ea\_act \_sortingdata \_relax

while \_setsecondarysorting starts from 66:

   \_def \_act ##1{\_ifx##1\_relax \_else
      \_ifx##1,\_else \_advance\_tmpnum by1 \_lccode`##1=\_tmpnum \_fi
      \_ea\_act \_fi}%
  \_tmpnum=65 \_ea\_act \_sortingdata \_relax

Both should start from 65, or am I missing something?

olsak commented 3 years ago

Thank you for your second report. I corrected this. I close this issue because the original problem was solved.