transpect / xml2tex

Converts XML to LaTeX
BSD 2-Clause "Simplified" License
43 stars 4 forks source link

Switching between tabular and tabularx #10

Closed mabromuela88 closed 3 years ago

mabromuela88 commented 3 years ago

How do I switch between tabular and tabularx environment when I try to define a table in input xml? My sample table looks something like this:

<table>
  <tgroup>
    <colspec colnum="1" colwidth="0.5cm"/>
    <colspec colnum="2" colwidth="0.5cm"/>
    <row>
    Data1<entry/>Data2</row></tgroup>
</table>

And I get an output as:

\begin{table}
\begin{tabularx}{\textwidth}{|
p{\dimexpr 0.5\linewidth-2\tabcolsep}|
p{\dimexpr 0.5\linewidth-2\tabcolsep}|} \hline 

Data1 & Data2 \\\hline 
\end{tabularx}
\end{table}

What I am trying to achieve is a table without any borders or grids and I prefer tabular since I only want very simple table, and it should not float (How do I pass option to an environment?). I saw a file calstable2tabular under xml2tex/xsl/ directory and I assume the stylesheet for tables apply from there. I tried to change default parameter values of table-model and table-grid but nothing solved (I know it is bad practice to mess up library files, I was just curious though.)

mabromuela88 commented 3 years ago

Also is a complete documentation for this project coming soon?

gimsieke commented 3 years ago

I was able to switch to tabular by passing table-model=tabular, like so:

calabash/calabash.sh -i source=xml2tex/test-doc.xml -i conf=xml2tex/test-conf.xml xml2tex/xpl/xml2tex.xpl table-model=tabular table-grid=no

There is some documentation, for example in the XProc pipeline:

  <p:option name="table-model" select="'tabularx'" required="false">
    <p:documentation>
      Use LaTeX package to draw tables. Permitted values are 'tabular',
      'tabularx' and 'htmltabs'.
    </p:documentation>
  </p:option>

Your Input is not valid DocBook, that might be another cause of errors.

I’m not very familiar with configuring and using this library. I suspect that one cause of errors might be that ignorable whitespace in the source code will be passed to the output, which can be a source of error in the case of newlines in particular.

So I fed this input into the pipeline (embedded in a proper Hub or DocBook document of course):

<informaltable>
    <tgroup cols="2">
      <colspec colnum="1" colwidth="0.5cm"/>
      <colspec colnum="2" colwidth="0.5cm"/>
      <tbody><row><entry><para>Data1</para></entry><entry><para>Data2</para></entry></row></tbody>
    </tgroup>
  </informaltable>

and obtained this output:

\begin{table}

\begin{tabular}{
p{\dimexpr 0.5\linewidth-2\tabcolsep}
p{\dimexpr 0.5\linewidth-2\tabcolsep}}
Data1 & Data2 \\
\end{tabular}

\end{table}

There is so little demand for comprehensive documentation for this library that apart from p:documentation in XProc, comments in the XSLT, and the configuration schema with its embedded documentation, we didn’t bother to create one. People who get into the business of configuring this usually read the code in the aforementioned places, or look at docx2tex for a larger sample configuration/pipeline. Most of the time, my colleagues deal with configuring this, so there hasn’t been much demand for something more approachable. But when my colleague @mkraetke returns from vacation, he might generate a more elaborate documentation. I can’t promise though because it’s likely that there are more urgent priorities. Remember that we put all these packages here as they are, with no implicit or explicit assertion of aptness, comprehensibility, correctness, etc. Of course better documentation might increase adoption by external users like you, but this is not a priority.

mabromuela88 commented 3 years ago

Is there a way to shorten <entry><para>...</entry></para> to simply something like <cell>...</cell>?

gimsieke commented 3 years ago

You can use arbitrary markup in your source XML if you either provide an xml2tex mapping for it or (the easier way) if you transform it into the expected markup in a prepocessing XSLT pass.

The question is, which target element will receive the attributes. If you use, for example, @css:border-left-color, @morerows, or css:background-color, they should go to entry. If you use @css:font-size, it should go to para (in principle they can go to entry, too, but I’m not sure whether contained paras will inherit them when generating TeX). So this needs to be sorted out.

Another issue is that you can’t use a default DocBook schema or our Hub schema for validating the input XML. You can’t use a standard DocBook schema anyway if you want to use @css:* attributes.

That said, such an XSLT could look like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:css="http://www.w3.org/1996/css"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns="http://docbook.org/ns/docbook"
  xpath-default-namespace="http://docbook.org/ns/docbook"
  exclude-result-prefixes="css xs"
  version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:variable name="cell-att-name-regex" as="xs:string" select="'(border|padding|row|col)'"/>

  <xsl:template match="cell">
    <entry>
      <xsl:apply-templates select="@*[matches(name(), $cell-att-name-regex)]"/>
      <para>
        <xsl:apply-templates select="@*[not(matches(name(), $cell-att-name-regex))], node()"/>
      </para>
    </entry>
  </xsl:template>
</xsl:stylesheet>

Applying it to this input:

<informaltable xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:css="http://www.w3.org/1996/css" version="5.1">
  <tgroup cols="2">
    <tbody>
      <row>
        <entry>
          <para>ff</para>
        </entry>
        <entry>
          <para>ff</para>
        </entry>
      </row>
      <row>
        <cell css:border-left-color="#F40" css:font-size="8pt" morerows="0">gg</cell>
        <cell>gg</cell>
      </row>
    </tbody>
  </tgroup>
</informaltable>

yields this output:

<informaltable xmlns="http://docbook.org/ns/do
cbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:css="http://www.w3.org/1996/css" version="5.1">
  <tgroup cols="2">
    <tbody>
      <row>
        <entry>
          <para>ff</para>
        </entry>
        <entry>
          <para>ff</para>
        </entry>
      </row>
      <row>
        <entry css:border-left-color="#F40" morerows="0"><para css:font-size="8pt">gg</para></entry>
        <entry><para>gg</para></entry>
      </row>
    </tbody>
  </tgroup>
</informaltable>
mabromuela88 commented 3 years ago

So do you suggest that I actually create an xsl file with the stylesheet you gave then import it from my config file?

gimsieke commented 3 years ago

No, this won’t work. The “correct” (expected) markup needs to be in place before xml2tex starts to do its thing. You can create an XProc pipeline that features a p:xslt step followed by the actual xml2tex:convert step.

mabromuela88 commented 3 years ago

Is it possible to use <xsl:value-of> element in preamble? I only can get the value of attributes of the <hub> in input.xml i.e @xml:lang like this:

<xsl:value-of select="//@xml:lang"/> But I can't seem to extract values from elements that are children of <hub>. Example:

<hub xmlns="http://docbook.org/ns/do
cbook" version="5.0" xml:lang="de">
  <coverimage>apple.jpg</coverimage>
....
</hub>

I was supposed to obtain value with <xsl:value-of select="coverimage"/> in config file, but nothing shows up.

gimsieke commented 3 years ago

coverimage is in the DocBook namespace. You can use the prefix dbk in the configuration file: <xsl:value-of select="dbk:coverimage"/>

mabromuela88 commented 3 years ago

coverimage is in the DocBook namespace. You can use the prefix dbk in the configuration file: <xsl:value-of select="dbk:coverimage"/>

Just a slight modification, I managed to extract value using <xsl:value-of select="//dbk:coverimage"/>. Had you not mentioned about dbk namespace, I wouldn't realize that well my entire input doc is under dbk namespace!