michal-h21 / make4ht

Build system for tex4ht
137 stars 15 forks source link

Characters vanishing #57

Closed ThiloteE closed 1 year ago

ThiloteE commented 2 years ago

Edit: make4ht version v0.3j

Taken with command make4ht myfile.tex "mathml":

image

Taken with command make4ht --lua myfile.tex "mathml":

image

Could be some kind of compression issue. It reminds me of a certain printer bug: https://www.youtube.com/watch?v=7FeqF1-Z1g0 (sorry, the video is in German language, but quite funny and explains the symptoms and generic causes REALLY well!)

michal-h21 commented 2 years ago

It will be probably caused by used fonts. It seems that this font has buggy support for ligatures with TeX4ht. Can you post a TeX source that shows this issue?

ThiloteE commented 2 years ago

I used ebgeramond font:

\usepackage[english]{babel} \usepackage{ebgaramond} \usepackage[autostyle,german=quotes]{csquotes}

Do i have to send you my big tex file? xD

I do have quite a few stuff in there, but if you insist i would :D

michal-h21 commented 2 years ago

Thanks, this is enough, as it demonstrates the missing ligatures and formatting quite well. I will take a look at this. It will probably need new font translation tables.

michal-h21 commented 2 years ago

Yes, so it really is a font issue. You can either ignore the ebgaramond package (like \ifdefined\HCode\else\usepackage{ebgaramond}\fi), or try to generate minimal font translation tables for ebgaramond. You can compile the following code using etex command, and it will generate the necessary files:

% $Id: tex4ht-fonts-fourier.tex 790 2020-08-31 21:56:35Z karl $
% etex {nameofthefile.tex}
%
% Copyright 2020 TeX Users Group.
% Released under LPPL 1.3c+.
% See tex4ht-cpright.tex for license text.

% Copyright (C) 2018 TeX Users Group

\input tex4ht.sty    
   \Preamble{xhtml,th4,sections+}
\EndPreamble
\input ProTex.sty          
%\AlProTex{c,<<<>>>,`,title,list,ClearCode,_^}
\AlProTex{c,<<<>>>,`,title,list,`,ClearCode,_^}

\def\HOME{./tex4ht.dir/}
\def\DTDS{./dtd.dir/}           
\def\SOURCE{./html.dir/}

\def\MYdir{\HOME texmf/tex4ht/ht-fonts}

\newwrite\dbcs     
\newwrite\unicode  

\def\AddFont{\futurelet\ext\AddFontA}
\def\AddFontA{%
   \if [\ext \def\ext[##1]{\def\ext{##1}\AddFontB}%
   \else     \def\ext{\def\ext{htf}\AddFontB}\fi
   \ext}
\def\AddFontB#1#2{%
   \Comment{}{}\OutputCode[\ext]\<#1\>%
   \let\StartDir=\empty  \def\EndDir{#2}\MakeDir
   \ifx \WWWdir\Undef \else
      \Needs{"cp #1.\ext\space \WWWdir /#2.\ext"}%
      \Needs{"chmod 644 \WWWdir /#2.\ext"}%
   \fi
   \Needs{"mv #1.\ext\space \MYdir /#2.\ext"}%
   }
\def\MakeDir{\relax
   \expandafter \ifx  \csname !\StartDir\endcsname\relax
      \expandafter\let\csname !\StartDir\endcsname=\empty
      \Needs{"mkdir -p \MYdir/\StartDir"}%     
      \ifx \WWWdir\Undef \else
         \Needs{"mkdir -p \MYdir/\StartDir"}%     
         \Needs{"chmod 711 \WWWdir /StartDir"}%
      \fi
   \fi
   \ifx \EndDir\empty \else
       \expandafter\AppendDir \EndDir////*%
       \expandafter\MakeDir
   \fi
}
\def\AppendDir#1/#2/#3/*{%
   \def\temp{#2}\ifx \temp\empty  \let\EndDir=\empty 
   \else
      \edef\StartDir{\ifx \StartDir\empty\else \StartDir/\fi
                     #1}\def\EndDir{#2/#3}%
   \fi
}

% writing EBGaramond-Regular-osf-sc-ot1.htf hash: d2c6ffdc3b663c5b48cbc1acc71e29ff
\<EBGaramond-Regular-osf-sc-ot1\><<<
EBGaramond-Regular-osf-sc-ot1 0 170
'&#x0393;' '' Gamma 0
'&#x2206;' '' uni2206 1
'&#x0398;' '' Theta 2
'&#x039B;' '' Lambda 3
'&#x039E;' '' Xi 4
'&#x03A0;' '' Pi 5
'&#x03A3;' '' Sigma 6
'&#x03A5;' '' Upsilon 7
'&#x03A6;' '' Phi 8
'&#x03A8;' '' Psi 9
'&#x2126;' '' uni2126 10
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0131;' '' dotlessi.sc 16
'&#x0237;' '' uni0237 17
'&#x60;' '' grave.sc 18
'&#x00B4;' '' acute.sc 19
'&#x02C7;' '' caron 20
'&#x02D8;' '' breve 21
'&#x00AF;' '' macron.sc 22
'&#x02DA;' '' ring 23
'&#x00B8;' '' cedilla.sc 24
's' '' s.sc s.sc 25
'&#x00E6;' '' ae.sc 26
'&#x0153;' '' oe.sc 27
'&#x00F8;' '' oslash.sc 28
'&#x00C6;' '' AE 29
'&#x0152;' '' OE 30
'&#x00D8;' '' Oslash 31
' ' '' space 32
'!' '' exclam.sc 33
'&#x201D;' '' quotedblright 34
'#' '' numbersign.sc 35
'$' '' dollar.sc 36
'%' '' percent.sc 37
'&amp;' '' ampersand.sc 38
'&#x2019;' '' quoteright 39
'(' '' parenleft.sc 40
')' '' parenright.sc 41
'*' '' asterisk.sc 42
'+' '' plus.sc 43
',' '' comma.sc 44
'-' '' hyphen.sc 45
'.' '' period.sc 46
'/' '' slash.sc 47
'0' '' zero.osf 48
'1' '' one.osf 49
'2' '' two.osf 50
'3' '' three.osf 51
'4' '' four.osf 52
'5' '' five.osf 53
'6' '' six.osf 54
'7' '' seven.osf 55
'8' '' eight.osf 56
'9' '' nine.osf 57
':' '' colon.sc 58
';' '' semicolon.sc 59
'&#x00A1;' '' exclamdown.sc 60
'&#xa;' '' equal.sc 61
'&#xe;' '' questiondown.sc 62
'&#xe;' '' question.sc 63
'@' '' at.sc 64
'A' '' A 65
'B' '' B 66
'C' '' C 67
'D' '' D 68
'E' '' E 69
'F' '' F 70
'G' '' G 71
'H' '' H 72
'I' '' I 73
'J' '' J 74
'K' '' K 75
'L' '' L 76
'M' '' M 77
'N' '' N 78
'O' '' O 79
'P' '' P 80
'Q' '' Q 81
'R' '' R 82
'S' '' S 83
'T' '' T 84
'U' '' U 85
'V' '' V 86
'W' '' W 87
'X' '' X 88
'Y' '' Y 89
'Z' '' Z 90
'[' '' bracketleft.sc 91
'&#x201C;' '' quotedblleft 92
']' '' bracketright.sc 93
'&#x02C6;' '' circumflex 94
'&#x02D9;' '' dotaccent 95
'&#x2018;' '' quoteleft 96
'a' '' a.sc 97
'b' '' b.sc 98
'c' '' c.sc 99
'd' '' d.sc 100
'e' '' e.sc 101
'f' '' f.sc 102
'g' '' g.sc 103
'h' '' h.sc 104
'i' '' i.sc 105
'j' '' j.sc 106
'k' '' k.sc 107
'l' '' l.sc 108
'm' '' m.sc 109
'n' '' n.sc 110
'o' '' o.sc 111
'p' '' p.sc 112
'q' '' q.sc 113
'r' '' r.sc 114
's' '' s.sc 115
't' '' t.sc 116
'u' '' u.sc 117
'v' '' v.sc 118
'w' '' w.sc 119
'x' '' x.sc 120
'y' '' y.sc 121
'z' '' z.sc 122
'&#x2013;' '' endash 123
'&#x2014;' '' emdash 124
'&#x02DD;' '' hungarumlaut 125
'&#x02DC;' '' tilde 126
'&#x00A8;' '' dieresis.sc 127
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0141;' '' Lslash 138
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0142;' '' lslash.sc 170
EBGaramond-Regular-osf-sc-ot1 0 170
htfcss:  EBGaramond-Regular-osf-sc-ot1  font-variant: small-caps; font-family: 'EB Garamond Regular', serif;

>>>
\AddFont{EBGaramond-Regular-osf-sc-ot1}{unicode/EB_Garamond_Regular/EBGaramond-Regular-osf-sc-ot1}{}
% writing EBGaramond-Bold-osf-ot1.htf hash: a4db6c93392e5ac44bc0e93e694589fb
\<EBGaramond-Bold-osf-ot1\><<<
EBGaramond-Bold-osf-ot1 0 170
'&#x0393;' '' Gamma 0
'&#x2206;' '' uni2206 1
'&#x0398;' '' Theta 2
'&#x039B;' '' Lambda 3
'&#x039E;' '' Xi 4
'&#x03A0;' '' Pi 5
'&#x03A3;' '' Sigma 6
'&#x03A5;' '' Upsilon 7
'&#x03A6;' '' Phi 8
'&#x03A8;' '' Psi 9
'&#x2126;' '' uni2126 10
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0131;' '' dotlessi 16
'&#x0237;' '' uni0237 17
'&#x60;' '' grave 18
'&#x00B4;' '' acute 19
'&#x02C7;' '' caron 20
'&#x02D8;' '' breve 21
'&#x00AF;' '' macron 22
'&#x02DA;' '' ring 23
'&#x00B8;' '' cedilla 24
'&#x00DF;' '' germandbls 25
'&#x00E6;' '' ae 26
'&#x0153;' '' oe 27
'&#x00F8;' '' oslash 28
'&#x00C6;' '' AE 29
'&#x0152;' '' OE 30
'&#x00D8;' '' Oslash 31
' ' '' space 32
'!' '' exclam 33
'&#x201D;' '' quotedblright 34
'#' '' numbersign 35
'$' '' dollar 36
'%' '' percent 37
'&amp;' '' ampersand 38
'&#x2019;' '' quoteright 39
'(' '' parenleft 40
')' '' parenright 41
'*' '' asterisk 42
'+' '' plus 43
',' '' comma 44
'-' '' hyphen 45
'.' '' period 46
'/' '' slash 47
'0' '' zero.osf 48
'1' '' one.osf 49
'2' '' two.osf 50
'3' '' three.osf 51
'4' '' four.osf 52
'5' '' five.osf 53
'6' '' six.osf 54
'7' '' seven.osf 55
'8' '' eight.osf 56
'9' '' nine.osf 57
':' '' colon 58
';' '' semicolon 59
'&#x00A1;' '' exclamdown 60
'=' '' equal 61
'&#x00BF;' '' questiondown 62
'?' '' question 63
'@' '' at 64
'A' '' A 65
'B' '' B 66
'C' '' C 67
'D' '' D 68
'E' '' E 69
'F' '' F 70
'G' '' G 71
'H' '' H 72
'I' '' I 73
'J' '' J 74
'K' '' K 75
'L' '' L 76
'M' '' M 77
'N' '' N 78
'O' '' O 79
'P' '' P 80
'Q' '' Q 81
'R' '' R 82
'S' '' S 83
'T' '' T 84
'U' '' U 85
'V' '' V 86
'W' '' W 87
'X' '' X 88
'Y' '' Y 89
'Z' '' Z 90
'[' '' bracketleft 91
'&#x201C;' '' quotedblleft 92
']' '' bracketright 93
'&#x02C6;' '' circumflex 94
'&#x02D9;' '' dotaccent 95
'&#x2018;' '' quoteleft 96
'a' '' a 97
'b' '' b 98
'c' '' c 99
'd' '' d 100
'e' '' e 101
'f' '' f 102
'g' '' g 103
'h' '' h 104
'i' '' i 105
'j' '' j 106
'k' '' k 107
'l' '' l 108
'm' '' m 109
'n' '' n 110
'o' '' o 111
'p' '' p 112
'q' '' q 113
'r' '' r 114
's' '' s 115
't' '' t 116
'u' '' u 117
'v' '' v 118
'w' '' w 119
'x' '' x 120
'y' '' y 121
'z' '' z 122
'&#x2013;' '' endash 123
'&#x2014;' '' emdash 124
'&#x02DD;' '' hungarumlaut 125
'&#x02DC;' '' tilde 126
'&#x00A8;' '' dieresis 127
'&#xFB00;' '' f_f 128
'&#xFB01;' '' f_i 129
'&#xFB02;' '' f_l 130
'&#xFB03;' '' f_f_i 131
'&#xFB04;' '' f_f_l 132
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0141;' '' Lslash 138
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'' ''  
'&#x0142;' '' lslash 170
EBGaramond-Bold-osf-ot1 0 170
htfcss:  EBGaramond-Bold-osf-ot1  font-weight: bold; font-family: 'EB Garamond Bold', serif;

>>>
\AddFont{EBGaramond-Bold-osf-ot1}{unicode/EB_Garamond_Bold/EBGaramond-Bold-osf-ot1}{}
% writing EBGaramond-Italic-osf-ot1.htf hash: a4db6c93392e5ac44bc0e93e694589fb
\<EBGaramond-Italic-osf-ot1\><<<
.EBGaramond-Bold-osf-ot1
htfcss:  EBGaramond-Italic-osf-ot1  font-style: italic; font-family: 'EB Garamond Italic', serif;

>>>
\AddFont{EBGaramond-Italic-osf-ot1}{alias/EB_Garamond_Italic/EBGaramond-Italic-osf-ot1}{}
% writing EBGaramond-Regular-osf-ot1.htf hash: a4db6c93392e5ac44bc0e93e694589fb
\<EBGaramond-Regular-osf-ot1\><<<
.EBGaramond-Bold-osf-ot1
htfcss:  EBGaramond-Regular-osf-ot1  font-family: 'EB Garamond Regular', serif;

>>>
\AddFont{EBGaramond-Regular-osf-ot1}{alias/EB_Garamond_Regular/EBGaramond-Regular-osf-ot1}{}
\bye

I will also update TeX4ht sources, but it may take some time before this comes into TeX distributions.

ThiloteE commented 2 years ago

Thank you :)

Well, for me personally, i was fine with using the --lua command, but maybe others have encountered the same problem and have not found a solution yet, so i am sure there will be a few that appreciate your fast response and that your fix might make it into the main version. 😀

michal-h21 commented 2 years ago

You are welcome :) I guess that Ebgaramond was updated recently and the update changed the internal font structure dramatically, because the font tables are totally different than they used to be. I am sure this will help some people in the future, thanks for the report :)