latex3 / tagpdf

Tagging support code for LaTeX
60 stars 7 forks source link

new option for associating embedded files with the structure elements #18

Closed bdoubrov closed 3 years ago

bdoubrov commented 5 years ago

In scope of PDF 2.0 it would be very useful to add the following new options to the \tagstructbegin command:

For example,

\tagstructbegin{tag=Formula,af=x^2+y^2=z^2,afmime=application/x-tex,afrel=Source}
\[  
   x^2+y^2=z^2 
\]
\tagstructend
u-fischer commented 5 years ago

Yes, I plan to add support for associated files in the near feature.

u-fischer commented 3 years ago

I just uploaded a new version which allows to add associated files to structures. It requires the also just uploaded new PDF management for latex (the pdfmanagement-testphase package)

AlexandrKozlovskiy commented 3 years ago

@u-fischer I will test it later,but did you plan for future add this feature also for mc kids too?

AlexandrKozlovskiy commented 3 years ago

@u-fischer Can you please provide an example which contains formula inside af stream,for example [1+1=2],including [ and ]. Also it should be provided separated keys Subtype and AFRelationship. look an example at https://github.com/AlexandrKozlovskiy/tagpdf/tree/associated_files. Yes,i understand,what my branch is very deprecated,but it shows,how it should work. subtype key allow for us to know,what content type we have for object,for example whether it mathml or tex. Also,imho,i didnt it in my branch,but it would be nice,if we will have key for full name of embedded associated file or at least for extension of this file,because instead of .txt we can have,for example .tex. In my branch you will find file af.pdf in experiments folder. Almost all changes i did in file tagpdf-struct.dtx. If you already have have at least part of this keys,excuse me please.

u-fischer commented 3 years ago

did you plan for future add this feature also for mc kids too?

I don't know, it is not difficult, but I would have to see some sensible use case first.

which contains formula inside af stream,

\tagstructbegin{tag=P,AFinline={[1+1=\sum_{i=1}^{i=2} 1]}}

creates a stream

  stream
[1+1=\sum _{i=1}^{i=2} 1]
endstream

it should be provided separated keys Subtype and AFRelationship

Subtype and AFRelationship can be adapted for now by changing the relevant file dictionaries, see the documentation of l3pdffile.

\pdfdict_put:nnn {l_pdffile/FileSpec} {AFRelationship}{/Supplement}
\pdfdict_put:nnn {l_pdffile/FileSpec} {Subtype}{/mathml\c_hash_str2Fxml}
\pdfdict_put:nnn {l_pdffile}               {Subtype}{/mathml\c_hash_str2Fxml}

key for full name of embedded associated file or at least for extension of this file,because instead of .txt we can have,for example .tex.

AFinline will for now use an automatically created file name and .txt as extension. If you create the stream yourself you are quite free to use whatever extension or filename you want. See l3pdffile and the example ex-AF-file.tex.

dbitouze commented 3 years ago

@u-fischer I will test it later,but did you plan for future add this feature also for mc kids too?

Sorry to intervene in this thread, but could you explain what are "mc kids"?

u-fischer commented 3 years ago

@dbitouze when you tag a pdf you have to mark and number all the text in the page stream so that you can reference them in the structure. This is done by surrounding them (basically) with a tag containing an "MCID" number, and these are called the mc kids of the structure element.

/P  <<MCID 1>>  BDC
 text
EMC
car222222 commented 3 years ago

@dbitouze Following on from what Ulrike wrote, the 'structure object' would then 'contain' as one of its 'kids', a reference to this page and the MCID of this marked content (mc) section of the page stream.

AlexandrKozlovskiy commented 3 years ago

I don't know, it is not difficult

@u-fischer For example,if inside one structure elements i want to create several kids with formula. But i am not sure,that it's really not difficult,because it should works for generic and lua mode,and it can be multypage braking problems.

[1+1=\sum_{i=1}^{i=2} 1]

@u-fischer If we use subtype something like application/x-tex,we should use latex notation,so instead of [ and ] we should use [ and ],,or $$...$$ or $...$,i.e before [ or ] we should have \ character.

Subtype and AFRelationship can be adapted for now by changing the relevant file dictionaries.

But if you will add this as separate keys,it would be less code for user and it will inprove usability of this feature. In this case,if user will forget add key subtype or afrelationship to the object,it should be use default value. Also,as you or someone wrote in last discussion,names of variables can be change,because it still experemental packages,but names of this params in pdf,i hope,not will be change never. So,please,@u-fischer,add keys for afrelationship and subtype too,if it's not difficult for you.

u-fischer commented 3 years ago

For example,if inside one structure elements i want to create several kids with formula.

But why would you want to give every kid its own associated file? mc-kids are rather arbitrary splits of a structure, and I don't see what you gain by splitting the file content over them, instead of a using a clean file attached to a structure.

before [ or ] we should have \ character.

well add it, my example above was only an example, change it too your need.

it would be less code for user and it will inprove usability of this feature [...] i hope,not will be change never

This package is not a user package. It is for developers to test tagging. So improving user usability is currently not on the agenda. Also I do explicitly not promise not to change command names or key names. The code will at some time hopefully wander into the LaTeX kernel and then we will discuss also interfaces, and quite certainly some things will change.

AlexandrKozlovskiy commented 3 years ago

But why would you want to give every kid its own associated file?

Because,for example,i want to add several formulas in one structure and not use for it attached structures. So for it we need support of associated files for mc. Also imho if pdf allow to do it and this package try to use of all pdf features,it should be done too. But i understand,what it's very difficult to do it for mc,according to the pdf documentation.

well add it, my example above was only an example

Ok,but i want simple clarify,how correct insert this formulas,too latex not apply commands. For example how correct put in pdf stream formula [1+1=2] or \begin{equation} x+1=1 \end{equation}

Because we can change catcodes,or use \ditokinize command for this. Or something else. So,@u-fischer,what way is most correct for puting of latex formulas in pdf stream?

This package is not a user package. It is for developers to test tagging.

@u-fischer Not,in my opinion your package can be used for making of some documents,for example,more accessible for blind. So any scientist,who make article in latex,can do it more accessible for blind. We develop axessibility package,which do automatic tagging and labeling of formulas. This package use your package. And we will add probably support of associated files,using features of your package.

So improving user usability is currently not on the agenda.

But,for example,for alttext and actualtext we have special keys,so,,@u-fischer,Please,add this keys for subtype and afrelationship. This feature is more important for me,than associated files for mc. And i hope it more easear to do this,than associated files for mc. If you want,i can open issue or try even create pull request with this keys,because this feature is very important i hope not only for me. Thanks.

AlexandrKozlovskiy commented 3 years ago

@u-fischer When i tryed to compile your example ex-AF-file.tex,latex not detected package l3bitset.sty. I have latex 2020-10-01 lualatex 1.13.0 and expl3 version 2021-02-18. Thanks.

u-fischer commented 3 years ago

l3bitset is in a current l3experimental

car222222 commented 3 years ago

. . . and is still, therefore, very much experimental !! :-)

As are many of the details, and some of the bigger ideas, in tagpdf etc.

So beware of many possible changes over the next year or so!

The good news is that at the end of this experimental phase, we (The LaTeX Team and its Accessible PDF project) will offer you a solid, stable platform for your very interesting project on math tagging.

AlexandrKozlovskiy commented 3 years ago

@u-fischer During of compilation of your example i get an error: ! Package pdfdict Error: The dictionary 'l_pdffile/Filespec' is unknown.

l.40 ...file/Filespec} {AFRelationship}{/Supplement}

@car222222 the goals of our project axessibility package it not only do the tagging of math,but tagging of whole document,when it's possible. But we haven't enough latex skills,to do tagging of whole parts of document. Now we have support of tagging of formulas,paragraphs,sections,sub...sections,tables and lists. But we have some texnical problems,for correct tagging of it. For example,i not find way,how to do tagging of each item which we have in \tableofcontents command for any class of document and insert links on each section,sub...sections in contents. Tagging of links,imho,not solved problems,because with screenreader during the clicking on the link nothing hapen. Look at issue #35. I hope @u-fischer will fix this issue too.

u-fischer commented 3 years ago

Please don't add new issues to old issues. This one is closed. But I know the error and it is resolved in the development versions. I will do an update in the next days.

AlexandrKozlovskiy commented 3 years ago

@u-fischer I dont want create new issue,because i am not sure,what it's not only my issue. But now i understood and i will wait commit with fix of this issue. Whether it pdfresources issue,or issue of your example. May be if i will change name of pdf dictionary in your example myself,all will works ok.

u-fischer commented 3 years ago

Yes, but it isn't completly correct internally, I mixed up FileSpec and Filespec in a few places .. (the second is correct and will be used in future versions).