michal-h21 / make4ht

Build system for tex4ht
131 stars 15 forks source link

Tables in Docbook #117

Closed hcf-n closed 11 months ago

hcf-n commented 1 year ago

I'm converting a document with a lot of tables to Docbook and have encountered a couple of issues.

MWE:

\documentclass{article}
\title{Test}
\begin{document}
\begin{tabular}{r r}
\textbf{Header 1} & \textbf{Header 2}  \\
Row 1 & 14 \\
Row 2 er & 14 \\
\end{tabular}
\end{document}

Resulting XML

<?xml version="1.0"?>
<!DOCTYPE article
  PUBLIC '-//OASIS//DTD DocBook V5.0//EN'
  'http://www.oasis-open.org/docbook/xml/5.0/docbook.dtd'>
<!--translated from test.tex  
by TeX4ht (https://tug.org/tex4ht/) xhtml,charset=utf-8,docbook,html,refcaption -->
<?xtpipes file="docbook.4xt" ?>
<article version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
   <!--l. 5
-->
   <para/>
   <informaltable>
      <tgroup cols="2">
         <colspec colname="c1"/>
         <colspec colname="c2"/>
         <tbody>
            <row>
               <entry align="right">
                  <emphasis role="bf">Header 1</emphasis>
               </entry>
               <entry align="right">
                  <emphasis role="bf">Header 2</emphasis>
               </entry>
            </row>
            <row>
               <entry align="right">Row 1</entry>
               <entry align="right">14</entry>
            </row>
            <row>
               <entry align="right">Row 2 er</entry>
               <entry align="right">14</entry>
            </row>
            <row>
               <entry align="right"/>
            </row>
         </tbody>
      </tgroup>
   </informaltable>
</article>
  1. There seems to be an extra row at the bottom

            <row>
               <entry align="right"/>
            </row>
  2. The headers seem to be wrapped in <tbody> instead of <thead>

Thank you for making make4ht!

hcf-n commented 1 year ago

I've looked at bit more into nr. 2 and have realized that there is no semantic information to differentiate between header and body of a latex table. Or does anybody see a way to implement it?

michal-h21 commented 1 year ago

Try this build file:

local domfilter = require "make4ht-domfilter"

local process = domfilter {
  function(dom)
    for _, tbl in ipairs(dom:query_selector("tgroup")) do
      local tbody_pos
      local tbody
      for i, el in ipairs(tbl:get_children()) do
        -- find position of tbody, so we can insert thead before it
        if el:get_element_name() == "tbody" then
          tbody_pos = i
          tbody = el
          break
        end
      end
      if tbody then
        -- add thead to the table
        local thead = tbl:create_element("thead")
        tbl:add_child_node(thead, tbody_pos)
        local rows = tbody:query_selector("row")
        if #rows > 1 then
          -- copy first row to thead
          local first_row = rows[1]:copy_node()
          rows[1]:remove_node()
          thead:add_child_node(first_row)
          -- remove empty row at the end
          local last_row = rows[#rows]
          -- if there is only one child and no text, remove it
          if #last_row:get_children() < 2 and last_row:get_text():gsub("%s", "") == "" then
            last_row:remove_node()
          end
        end
      end

    end
    return dom
  end
}

Make:match("xml$", process)

It moves the first row to a new thead element, and removes the last empty row.

hcf-n commented 11 months ago

Thanks, forgot to close the issue