lvjr / tabularray

Typeset tabulars and arrays with LaTeX3
https://ctan.org/pkg/tabularray
254 stars 22 forks source link

tagpdf support #27

Open u-fischer opened 3 years ago

u-fischer commented 3 years ago

We are working on a project to enhance LaTeX so that it can produce tagged pdf. https://www.latex-project.org/news/2020/11/30/tagged-pdf-FS-study/

For a tabular this means that one need to add commands quite similar to html-table commands to cells and rows.

So to successfully tag a tabular, one needs at least

The code for the cells and rows should at best have access to data like the current row/column number.

It would be nice if tabularray would add suitable hooks for this.

lvjr commented 3 years ago

Sorry I know little about these at this time. I have given you write access to this repository. Please feel free to add anything you want.

u-fischer commented 3 years ago

Thanks for the invitation. I'm sorry I don't have the time now to think about it, and in the project handling tabulars is for a good reason in a later phase of the project as this is not trivial.

But I think it is important that you consider in your code not only if you get the right visual appearance but also consider how the structure of the table is encoded. This is important if one wants to copy&paste a table or export it to html, or if people want to define layouts in a css-like manner eg as "make all header cells bolder"

lvjr commented 3 years ago

Yes, it is useful. I will leave this issue open and hope to come back for it one day.

u-fischer commented 3 years ago

Here a very simple example (it needs a current tagpdf 0.9). It marks up a table with one column which has a header and two rows. I think it gives an impression of the code we need to inject (it is even more as I left out a few details like attributes).

If you compile this and then upload the pdf at https://ngpdf.com/loadFile you can check the html and it will give something like this

<!DOCTYPE html>
<html><head>
<title>test-utf8</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document">
  <table data-pdf-se-type="Table">
   <thead data-pdf-se-type="THead">
    <tr data-pdf-se-type="TR">
     <th data-pdf-se-type="TH">Header</th>
    </tr>
   </thead>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
   <tr data-pdf-se-type="TR">
    <td data-pdf-se-type="TD">row1</td>
   </tr>
  </table>
 </div>
</body></html>
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{tagpdf,array}
\tagpdfsetup{activate}

\begin{document}

\tagstructbegin{tag=Table}
\begin{tabular}{l}
\tagstructbegin{tag=THead}%
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TH}%
\tagmcbegin{tag=TH}%
Header
\tagmcend
\tagstructend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row1
\tagmcend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row2
\tagmcend
\tagstructend
\tagstructend
\end{tabular}
\tagstructend

\end{document}
lvjr commented 3 years ago

Yes, it is very interesting.

lvjr commented 1 year ago

I will close this issue and further comments could be leaved in issue #197.

lvjr commented 1 year ago

I decide to reopen this issue to record experiments with tagpdf here.

lvjr commented 1 year ago

With the newly added public hooks and variables (#197) in trial/tabularray.sty, now we can correctly tag <table>, <tr> and <td> in the above commit.

image

<!DOCTYPE html>
<html><head>
<title>test-tagpdf-01</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body lang="en-US">
 <div data-pdf-se-type="Document" id="ID.001">
  <p data-pdf-se-type="P" id="ID.002"><span id="page-0" role="doc-pagebreak"></span>Some text.</p>
  <table data-pdf-se-type="Table" id="ID.003">
   <tbody><tr data-pdf-se-type="TR" id="ID.004">
    <td data-pdf-se-type="TD" id="ID.005"><p data-pdf-se-type="P" id="ID.006">Alpha</p></td>
    <td data-pdf-se-type="TD" id="ID.007"><p data-pdf-se-type="P" id="ID.008">Beta</p></td>
    <td data-pdf-se-type="TD" id="ID.009"><p data-pdf-se-type="P" id="ID.010">Gamma</p></td>
    <td data-pdf-se-type="TD" id="ID.011"><p data-pdf-se-type="P" id="ID.012">Delta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.013">
    <td data-pdf-se-type="TD" id="ID.014"><p data-pdf-se-type="P" id="ID.015">Epsilon</p></td>
    <td data-pdf-se-type="TD" id="ID.016"><p data-pdf-se-type="P" id="ID.017">Zeta</p></td>
    <td data-pdf-se-type="TD" id="ID.018"><p data-pdf-se-type="P" id="ID.019">Eta</p></td>
    <td data-pdf-se-type="TD" id="ID.020"><p data-pdf-se-type="P" id="ID.021">Theta</p></td>
   </tr>
   <tr data-pdf-se-type="TR" id="ID.022">
    <td data-pdf-se-type="TD" id="ID.023"><p data-pdf-se-type="P" id="ID.024">Iota</p></td>
    <td data-pdf-se-type="TD" id="ID.025"><p data-pdf-se-type="P" id="ID.026">Kappa</p></td>
    <td data-pdf-se-type="TD" id="ID.027"><p data-pdf-se-type="P" id="ID.028">Lambda</p></td>
    <td data-pdf-se-type="TD" id="ID.029"><p data-pdf-se-type="P" id="ID.030">Mu</p></td>
   </tr>
  </tbody></table>
  <p data-pdf-se-type="P" id="ID.031">More text.</p>
 </div>
</body></html>