Open u-fischer opened 3 years ago
Sorry I know little about these at this time. I have given you write access to this repository. Please feel free to add anything you want.
Thanks for the invitation. I'm sorry I don't have the time now to think about it, and in the project handling tabulars is for a good reason in a later phase of the project as this is not trivial.
But I think it is important that you consider in your code not only if you get the right visual appearance but also consider how the structure of the table is encoded. This is important if one wants to copy&paste a table or export it to html, or if people want to define layouts in a css-like manner eg as "make all header cells bolder"
Yes, it is useful. I will leave this issue open and hope to come back for it one day.
Here a very simple example (it needs a current tagpdf 0.9). It marks up a table with one column which has a header and two rows. I think it gives an impression of the code we need to inject (it is even more as I left out a few details like attributes).
If you compile this and then upload the pdf at https://ngpdf.com/loadFile you can check the html and it will give something like this
<!DOCTYPE html>
<html><head>
<title>test-utf8</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
</head>
<body lang="en-US">
<div data-pdf-se-type="Document">
<table data-pdf-se-type="Table">
<thead data-pdf-se-type="THead">
<tr data-pdf-se-type="TR">
<th data-pdf-se-type="TH">Header</th>
</tr>
</thead>
<tr data-pdf-se-type="TR">
<td data-pdf-se-type="TD">row1</td>
</tr>
<tr data-pdf-se-type="TR">
<td data-pdf-se-type="TD">row1</td>
</tr>
</table>
</div>
</body></html>
\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{uncompress}
\documentclass{article}
\usepackage{tagpdf,array}
\tagpdfsetup{activate}
\begin{document}
\tagstructbegin{tag=Table}
\begin{tabular}{l}
\tagstructbegin{tag=THead}%
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TH}%
\tagmcbegin{tag=TH}%
Header
\tagmcend
\tagstructend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row1
\tagmcend
\tagstructend
\tagstructend
\\
\tagstructbegin{tag=TR}%
\tagstructbegin{tag=TD}%
\tagmcbegin{tag=TD}%
row2
\tagmcend
\tagstructend
\tagstructend
\end{tabular}
\tagstructend
\end{document}
Yes, it is very interesting.
I will close this issue and further comments could be leaved in issue #197.
I decide to reopen this issue to record experiments with tagpdf
here.
With the newly added public hooks and variables (#197) in trial/tabularray.sty
, now we can correctly tag <table>
, <tr>
and <td>
in the above commit.
<!DOCTYPE html>
<html><head>
<title>test-tagpdf-01</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body lang="en-US">
<div data-pdf-se-type="Document" id="ID.001">
<p data-pdf-se-type="P" id="ID.002"><span id="page-0" role="doc-pagebreak"></span>Some text.</p>
<table data-pdf-se-type="Table" id="ID.003">
<tbody><tr data-pdf-se-type="TR" id="ID.004">
<td data-pdf-se-type="TD" id="ID.005"><p data-pdf-se-type="P" id="ID.006">Alpha</p></td>
<td data-pdf-se-type="TD" id="ID.007"><p data-pdf-se-type="P" id="ID.008">Beta</p></td>
<td data-pdf-se-type="TD" id="ID.009"><p data-pdf-se-type="P" id="ID.010">Gamma</p></td>
<td data-pdf-se-type="TD" id="ID.011"><p data-pdf-se-type="P" id="ID.012">Delta</p></td>
</tr>
<tr data-pdf-se-type="TR" id="ID.013">
<td data-pdf-se-type="TD" id="ID.014"><p data-pdf-se-type="P" id="ID.015">Epsilon</p></td>
<td data-pdf-se-type="TD" id="ID.016"><p data-pdf-se-type="P" id="ID.017">Zeta</p></td>
<td data-pdf-se-type="TD" id="ID.018"><p data-pdf-se-type="P" id="ID.019">Eta</p></td>
<td data-pdf-se-type="TD" id="ID.020"><p data-pdf-se-type="P" id="ID.021">Theta</p></td>
</tr>
<tr data-pdf-se-type="TR" id="ID.022">
<td data-pdf-se-type="TD" id="ID.023"><p data-pdf-se-type="P" id="ID.024">Iota</p></td>
<td data-pdf-se-type="TD" id="ID.025"><p data-pdf-se-type="P" id="ID.026">Kappa</p></td>
<td data-pdf-se-type="TD" id="ID.027"><p data-pdf-se-type="P" id="ID.028">Lambda</p></td>
<td data-pdf-se-type="TD" id="ID.029"><p data-pdf-se-type="P" id="ID.030">Mu</p></td>
</tr>
</tbody></table>
<p data-pdf-se-type="P" id="ID.031">More text.</p>
</div>
</body></html>
We are working on a project to enhance LaTeX so that it can produce tagged pdf. https://www.latex-project.org/news/2020/11/30/tagged-pdf-FS-study/
For a tabular this means that one need to add commands quite similar to html-table commands to cells and rows.
So to successfully tag a tabular, one needs at least
The code for the cells and rows should at best have access to data like the current row/column number.
It would be nice if tabularray would add suitable hooks for this.