michal-h21 / make4ht

Build system for tex4ht
131 stars 15 forks source link

LaTeX to HTML, Open HTML in DoCX/ODT and DoCX/ODT to LaTeX #121

Closed balakrishnan1978 closed 9 months ago

balakrishnan1978 commented 1 year ago

Dear Michal,

I would like to convert LaTeX to HTML and HTML to LaTeX.

make4ht -ux -a debug -c Equation.cfg Article1.tex 'fn-in' '' '-p'

Why LaTeX to HTML?

Because our Editors teams is not familiar with LaTeX Coding and they will only todo in Winword/OpenOffice for the purpose of the TEXT PART Modifications/UK Grammer Checks/Spell Checks/Journal Style Follow-up etc. I have done our basic requirements in the Equation.cfg file attached here.

All the Cross-Links need respective Text in the HTML format, But unable to find the solution in equation cross-links. Please advise.

Why HTML to LaTeX?

How to convert Winword/OpenOffice format to revert LaTeX coding (because Editors are modified texts) and to do Typesetting in LaTeX Platform?

When reverting HTML to LaTeX it's very good to maintain same LaTeX codeing instead of default LaTeX Code. (Example: \begin{algorithm}) etc...

Any suggestions...

Regards, balaji

Article1.txt Equation.txt

balakrishnan1978 commented 1 year ago

Dear Michal,

Any possibilities above requirements in TeX4HT/Make4HT? Kindly advise.

michal-h21 commented 1 year ago

In theory, it could be possible using the LuaXML transform library or XSLT. But it won't be easy. I don't know if it was you who asked this question, but some example is on TeX.sx: https://tex.stackexchange.com/a/548484/2891

michal-h21 commented 1 year ago

The big problem is with citations and cross-references. They cannot be reconstructed from HTML, I am afraid. If you don't mind leaving them as TeX commands in the HTML output, you can use this configuration file:

\RequirePackage{verbatim,etoolbox}
\Preamble{xhtml,mathml}
\newtoks\eqtoks 

\newcommand\verbcmd[1]{%
  \expandafter\def\csname #1\endcsname##1{\texttt{\textbackslash#1\{\detokenize{##1}\}}}
}
\verbcmd{eqref}
\verbcmd{label}
\verbcmd{cite}
\verbcmd{citep}
%\renewcommand\eqref[1]{\texttt{\string\eqref\{\detokenize{#1}\}}}

%%%% Equations
\def\AltMathOne#1${\HCode{\detokenize{$#1$}}$}
\Configure{$}{}{}{\expandafter\AltMathOne} 
\def\AltlMath#1\){\HCode{\detokenize{\(#1\)}}\)}
\Configure{()}{\AltlMath}{}
\def\AltlDisplay#1\]{\HCode{\detokenize{\[#1\]}}\]}
\Configure{[]}{\AltlDisplay}{}
\def\AltDisplayOne#1#2$${#1\HCode{\detokenize{$$#2$$}}$$}
\Configure{$$}{}{}{\AltDisplayOne}{}{}
\newcommand\VerbMath[1]{%
\ifcsdef{#1}{%
  \renewenvironment{#1}{%
    \NoFonts\fontencoding{OT1}%
  \Configure{verbatim}{}{} % suppress <br /> tags
    \texttt{\string\begin\{#1\}}\HCode{\Hnewline}% we need to use \texttt to get all characters right
      \verbatim}{\endverbatim\texttt{\string\end\{#1\}}\EndNoFonts}%
}{}%
}
%%%EOF Equations
\begin{document}
\VerbMath{subarray}
\VerbMath{smallmatrix}
\VerbMath{matrix}
\VerbMath{pmatrix}
\VerbMath{bmatrix}
\VerbMath{Bmatrix}
\VerbMath{vmatrix}
\VerbMath{Vmatrix}
\VerbMath{cases}
\VerbMath{subequations}
\VerbMath{aligned}
\VerbMath{alignedat}
\VerbMath{gathered}
\VerbMath{gather}
\VerbMath{gather*}
\VerbMath{alignat}
\VerbMath{alignat*}
\VerbMath{xalignat}
\VerbMath{xalignat*}
\VerbMath{xxalignat}
\VerbMath{align}
\VerbMath{align*}
\VerbMath{flalign}
\VerbMath{flalign*}
\VerbMath{split}
\VerbMath{multline}
\VerbMath{multline*}
\VerbMath{equation}
\VerbMath{equation*}
\VerbMath{math}
\VerbMath{displaymath}
\VerbMath{eqnarray}
\VerbMath{eqnarray*}
\EndPreamble

And the basic XML transform script could look like this:

kpse.set_program_name "luatex"
local domobject = require "luaxml-domobject"
local transform = require "luaxml-transform"
local html = transform.new()
html.unicodes = {}
html:add_action( ".sectionHead", "\\section{@<.>}" )
html:add_action(".titlemark", "")
html:add_action("p", "@<.>\n\n")

local text = io.read("*all")

local dom = domobject.parse(text)

print(html:process_dom(dom))

It will need much more rules to produce something usable, of course.

You can call it using this command:

 $ texlua htmltolatex.lua < filename.html > result.tex
balakrishnan1978 commented 1 year ago

Dear Michal,

Thanks for your details explanation and config file. When I have used all the \begin{equation}(.*)\end{equation} slash character( \) is not generating.

Example: \kappa is generated as kappa (\ is missing).

michal-h21 commented 1 year ago

I see. This is caused by a bug in font mapping file, pcrbo7t.htf. Here is a corrected version:

pcrbo7t 0 170
'' '' .notdef 0
'&#x02D9;' '' dotaccent 1
'&#xFB01;' '' fi 2
'&#xFB02;' '' fl 3
'&#x2044;' '' fraction 4
'&#x02DD;' '' hungarumlaut 5
'&#x0141;' '' Lslash 6
'&#x0142;' '' lslash 7
'&#x02DB;' '' ogonek 8
'&#x02DA;' '' ring 9
'' '' .notdef 10
'' ''
'' ''
'&#x0027;' '' quotesingle 13
'&#x00A1;' '' exclamdown 14
'&#x00BF;' '' questiondown 15
'&#x0131;' '' dotlessi 16
'&#x0131;' '' dotlessi 17
'&#x60;' '' grave 18
'&#x00B4;' '' acute 19
'&#x02C7;' '' caron 20
'&#x02D8;' '' breve 21
'&#x00AF;' '' macron 22
'&#x02DA;' '' ring 23
'&#x00B8;' '' cedilla 24
'&#x00DF;' '' germandbls 25
'&#x00E6;' '' ae 26
'&#x0153;' '' oe 27
'&#x00F8;' '' oslash 28
'&#x00C6;' '' AE 29
'&#x0152;' '' OE 30
'&#x00D8;' '' Oslash 31
' ' '' space 32
'!' '' exclam 33
'"' '' quotedbl 34
'#' '' numbersign 35
'$' '' dollar 36
'%' '' percent 37
'&amp;' '' ampersand 38
'&#x2019;' '' quoteright 39
'(' '' parenleft 40
')' '' parenright 41
'*' '' asterisk 42
'+' '' plus 43
',' '' comma 44
'-' '' hyphen 45
'.' '' period 46
'/' '' slash 47
'0' '' zero 48
'1' '' one 49
'2' '' two 50
'3' '' three 51
'4' '' four 52
'5' '' five 53
'6' '' six 54
'7' '' seven 55
'8' '' eight 56
'9' '' nine 57
':' '' colon 58
';' '' semicolon 59
'&lt;' '' less 60
'=' '' equal 61
'&gt;' '' greater 62
'?' '' question 63
'@' '' at 64
'A' '' A 65
'B' '' B 66
'C' '' C 67
'D' '' D 68
'E' '' E 69
'F' '' F 70
'G' '' G 71
'H' '' H 72
'I' '' I 73
'J' '' J 74
'K' '' K 75
'L' '' L 76
'M' '' M 77
'N' '' N 78
'O' '' O 79
'P' '' P 80
'Q' '' Q 81
'R' '' R 82
'S' '' S 83
'T' '' T 84
'U' '' U 85
'V' '' V 86
'W' '' W 87
'X' '' X 88
'Y' '' Y 89
'Z' '' Z 90
'[' '' bracketleft 91
'&#x005C;' '' backslash 92
']' '' bracketright 93
'&#x02C6;' '' circumflex 94
'_' '' underscore 95
'&#x2018;' '' quoteleft 96
'a' '' a 97
'b' '' b 98
'c' '' c 99
'd' '' d 100
'e' '' e 101
'f' '' f 102
'g' '' g 103
'h' '' h 104
'i' '' i 105
'j' '' j 106
'k' '' k 107
'l' '' l 108
'm' '' m 109
'n' '' n 110
'o' '' o 111
'p' '' p 112
'q' '' q 113
'r' '' r 114
's' '' s 115
't' '' t 116
'u' '' u 117
'v' '' v 118
'w' '' w 119
'x' '' x 120
'y' '' y 121
'z' '' z 122
'{' '' braceleft 123
'|' '' bar 124
'}' '' braceright 125
'&#x02DC;' '' tilde 126
'&#x00A8;' '' dieresis 127
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'&#x0141;' '' Lslash 138
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'' ''
'&#x0142;' '' lslash 170
pcrbo7t 0 170
htfcss:  pcrbo7t  font-weight: bold; font-style: oblique; font-family: 'Nimbus Mono L', serif;

The fixed file is also included in the latest update of TeX Live 2023.

balakrishnan1978 commented 1 year ago

@michal-h21 : Thanks for the file and it's working fine now...