linuxmint / xreader

A generic Document Reader
GNU General Public License v2.0
213 stars 60 forks source link

Add support for XMP metadata #309

Open homocomputeris opened 5 years ago

homocomputeris commented 5 years ago
 * xreader 2.0.2
 * Arch Linux

Issue Fields with multiple values and \sep are empty in Xreader while Evince 3.32.0 and ExifTool 11.30 can show it:

exiftool document.pdf
ExifTool Version Number         : 11.30
File Name                       : document.pdf
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 1
XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
Schemas Namespace URI           : http://ns.adobe.com/pdfx/1.3/
Schemas Prefix                  : pdfx
Schemas Schema                  : PDF/X Schema
Schemas Property Category       : external
Schemas Property Description    : URL to an online version or preprint
Schemas Property Name           : AuthoritativeDomain
Schemas Property Value Type     : Text
Producer                        : pdfTeX
Format                          : application/pdf
Title                           : Title
Creator                         : Author, Author2
Language                        : en-GB
Subject                         : keyword, keywoard
Part                            : 1
Conformance                     : B
Creator Tool                    : LaTeX with hyperref

Steps to reproduce Compile with pdfLaTeX and open in Xreader:

\begin{filecontents*}{\jobname.xmpdata}
\Author{Author\sep Author2}
\Title{Title}
\Language{en-GB}
\Keywords{keyword\sep keywoard}
\Subject{}
\Publisher{}
\end{filecontents*}

\documentclass[a4paper]{article}

\usepackage[
pdfa,
pdfencoding=unicode,
pdfusetitle,
]{hyperref}

\usepackage[a-1b]{pdfx}

\begin{document}
$\alpha\beta$
ab
\end{document}

Expected behaviour Show author and keywords in PDF's properties

Other information From PDFX docs:

Some of the metadata, such as the author, title, and keywords, can be stored both in the XMP packet and in the/Info dictionary. For the resulting file to be standards-compliant, the two copies of the data must be identical. This is taken care of automatically by the pdfx package, except when \sep is used to handle multiple entries, as discussed above in §2.4.1. In such cases the string is not included within the/Info dictionary. Note that this is in accordance with the PDF 2.0 specification [21], which deprecates use of the/Info dictionary for such metadata.

2.4.1. PDF Info strings When \sep is not used within its argument, the metadata from \Title, \Author and \Keywords is also included in the PDF/Info dictionary. When this is the case, validation for the declared standard will occur only if the corresponding /Info item and XMP metadata field convert to exactly the same Unicode string. This cannot happen when\sepis used, so the/Info items are then not populated.Unfortunately not all PDF browsers (in particular, older ones and much Apple software)give ready access to the XMP metadata packet. Some authors want to see everything usinge.g., the Unix/Linux command: pdfinfo -enc UTF-8. In fact there is the-metaoption to get the complete metadata packet (in UTF-8 encoding). This can give more than what one wants,so use it as follows: pdfinfo -meta <filename>.pdf | grep ’dc:’ to extract just the Dublin Core metadata fields.Another possibility is to not use \sep with multiple authors and/or keywords. Instead re-place it with simply ‘,’. We do not recommend doing this, as more sophisticated metadata tools will see the result as a single value, rather than multiple authors, say. Different language codes cannot be applied when done this way. However, some authors may find this a satisfactory solution that suits their own tools.

Piiit commented 5 years ago

Great. This is a very detailed request! Thx