latex3 / tagpdf

Tagging support code for LaTeX
59 stars 7 forks source link

create an attritute class with attributes in different owners #63

Open lvjr opened 1 year ago

lvjr commented 1 year ago

I am not sure whether it is possible to create an attritute class with attributes in different owners with current interface.

\tagpdfsetup{
  newattribute = {TH-row}{/O /Table /Scope /Row}
}

In PDF, an attritute class can have attributes in different owners.

lvjr commented 1 year ago

Maybe the following interface would be more of LaTeX style:

newattribute = {TH-row}{Table = {Scope=/Row}, Layout = {BorderStyle=/Dashed, TextAlign=/Center}}

Or maybe it is better to use \NewTagAttributeClass command instead of key-value pairs.

u-fischer commented 1 year ago

an attribute object has one owner, and a attribute class refers to it. If you want more than one attribute for a structure, it must be added as an array. This happens in tagpdf if you use a comma list of attributes. So e.g. this

\DocumentMetadata{uncompress,testphase=phase-II}
\documentclass{article}
\tagpdfsetup
{
  newattribute = {TH-row}{/O /Table /Scope /Row},
  newattribute = {layout-dashed}{/O /Layout /BorderStyle/Dashed /TextAlign/Center}
}
\begin{document}

\tagstructbegin{tag=Div, 
attribute-class={TH-row,layout-dashed}}
aaaa
\tagstructend

\end{document}

would give such an array in the structure:

/C [/TH-row /layout-dashed]
lvjr commented 1 year ago

But PDF standard says ClassMap is

A dictionary that maps name objects designating attribute classes to the corresponding attribute objects or arrays of attribute objects.

which means an attribute class can correspond to multiple attribute objects.

u-fischer commented 1 year ago

hm, yes rereading it is (probably) an option to have the array in the ClassMap. Actually I think I remember that I considered that when I implemented the attributes, but wasn't convinced that it would be a good idea. It complicates the user interface to define attributes and you would end up with lots of combinations in the ClassMap. I will perhaps reconsider when we use more attributes, but for now please use a comma list.

lvjr commented 1 year ago

The following is an example of mapping an array of attribute objects to an attribute class from Deriving HTML from PDF. We will need less attribute classes by using arrays of attribute objects and the derived CSS files will be cleaner.

PDF specifying class map:

1  0 obj 
<<  
/Type /StructTreeRoot 
/K [ ... ]        % PDF structure element Kids 
/IDTree ...       % ID tree mapping element IDs to PDF structure elements 
/RoleMap ...      % RoleMap for the default namespace 
/ParentTree ...   % Mapping for page content to parent PDF structure elements 
/ClassMap 2 0 R   % ClassMap for all elements 
>>  

2  0 obj          % ClassMap dictionary 
<<  
/HeadingStyle 
<<  
/O /CSS-2.00 
/text-align /center 
/color /red 
/font-family (Arial, Helvetica, sans-serif) 
/font-size (40px) 
>>  

/ParaStyle  
[ 
<<  
/O /Layout 
/Color [0 0 1] %blue 
/BorderColor [0 1 0] %green 
/TextAlign /Justify  
>>  

<<
/O /CSS-2.00 
/color /red 
/font-family ("Times New Roman", Times, serif) 
/font-size (12px) 
>>  
] 
>> 

CSS output:

.HeadingStyle { 
  text-align: center; color: red; 
  font-family: Arial, Helvetica, sans-serif;  
  font-size: 40px; 
} 

.ParaStyle { 
  font-family: "Times New Roman", Times, serif;  
  font-size: 12px; 
  color: red; /*coming from the CSS-2.00 attribute object dictionary
              and overrides the Color attribute defined in the 
              Layout attribute object dictionary*/ 
  border-color: green; /*coming from the Layout attribute object dictionary*/ 
}
u-fischer commented 1 year ago

I don't find the example really clean. You have there one attribute which contains only a /Layout class, and the other has both /Layout and /CSS-2.00.

Also you shouldn't assume that contents of the /Layout attribute and the /CSS go in parallel. For example your css set a font and declares a font size, but the layout not. Imho it would be much more flexible if you define a class /ParaStyle-Layout and a /ParaStyle-Css-serif-large and /ParaStyle-Css-sansserif-tiny and then combine as wanted in the structure.

lvjr commented 1 year ago

There is a possible problem with current interface. xcolor can write correct color value to PDF for \color{abc}, but to do the same thing tagpdf need to parse the raw input in newattribute:

newattribute = {layout}{/O /Layout /Color (abc)}
u-fischer commented 1 year ago

You can use the export functions from l3color to get the color values. But I would worry about attributes later, they are mostly decoration, get at first the structure right.