khaledhosny / harf

A HarfBuzz-based font loader and shaper for HarfTeX
GNU General Public License v2.0
10 stars 1 forks source link

PDF/A validation fails with harflatex + emoji #47

Closed emojifreak closed 5 years ago

emojifreak commented 5 years ago

I am not sure if the below is a problem in harflatex. When I compile the below latex file

\begin{filecontents*}{\jobname.xmpdata}
  \Title{test}
  \Author{author}
\end{filecontents*}

\RequirePackage{harfload}
\documentclass[luatex,unicode,a4paper,12pt]{article}
\usepackage[a-2u]{pdfx}
\usepackage{fontspec}
\setmainfont{Segoe UI Emoji}[
  RawFeature={mode=harf;+dist;+ccmp},
  BoldFont={Segoe UI Bold},
  ItalicFont={Segoe UI Italic},
  BoldItalicFont={Segoe UI Bold Italic}]

\begin{document}
\section{Test}
Test πŸ˜ƒ.
\end{document}

the generated PDF fails with the PDF/A validation at https://www.pdf-online.com/osa/validate.aspx with error


File | pdfa-test4.pdf
-- | --
Compliance | pdfa-2u
Result | Document does not conform to PDF/A.
Details | Validating file "pdfa-test4.pdf" for conformance level pdfa-2u
The Unicode for cid 9687 is unknown.
The Unicode for cid 9688 is unknown.
The Unicode for cid 9689 is unknown.
The Unicode for cid 9690 is unknown.
The Unicode for cid 9691 is unknown.
The document does not conform to the requested standard.
The document contains fonts without appropriate character to unicode mapping information (ToUnicode maps).The document does not conform to the PDF/A-2u standard.Done.

On the other hand, lualatex and the following latex file generated a PDF without validation error.

\begin{filecontents*}{\jobname.xmpdata}
  \Title{test}
  \Author{author}
\end{filecontents*}

\documentclass[luatex,unicode,a4paper,12pt]{article}
\usepackage[a-2u]{pdfx}
\usepackage{fontspec}
\setmainfont{Segoe UI Emoji}[
  RawFeature={+dist;+ccmp},
  BoldFont={Segoe UI Bold},
  ItalicFont={Segoe UI Italic},
  BoldItalicFont={Segoe UI Bold Italic}]

\begin{document}
\section{Test \textnormal{πŸ˜ƒ}}
Test πŸ˜ƒ.
\end{document}

Two generated PDF files are attached below: pdfa-test5.pdf pdfa-test4.pdf

khaledhosny commented 5 years ago

I have no access to PDF/A specs (last I checked they were not freely available), but the error seems bogus since not all glyphs have Unicode characters associated with them and that is why /ActualText spans are used here.

Such check is likely to fail with many Arabic or Indic fonts as well, not just fancy colored emoji.

You can enable support for COLR/CPAL tables in the lualatex example by using colr=true option.

BTW, why do you need to manually enable ccmp feature? It should be on by default.

emojifreak commented 5 years ago

Thanks for quick reply.

You can enable support for COLR/CPAL tables in the lualatex example by using colr=true option.

Thanks again for the info. LuaLaTeX and the below file generated a PDF with color emoji passing the PDF/A validator https://www.pdf-online.com/osa/validate.aspx There might be some problem in harftex...

\begin{filecontents*}{\jobname.xmpdata}
  \Title{test}
  \Author{author}
\end{filecontents*}

\documentclass[luatex,unicode,a4paper,12pt]{article}
\usepackage[a-2u]{pdfx}
\usepackage{fontspec}
\setmainfont{Segoe UI Emoji}[
  RawFeature={colr=true;+dist;+ccmp},
  BoldFont={Segoe UI Bold},
  ItalicFont={Segoe UI Italic},
  BoldItalicFont={Segoe UI Bold Italic}]

\begin{document}
\section{Test \textnormal{πŸ˜ƒ}}
Test πŸ˜ƒ.
\end{document}

pdfa-test7.pdf

At least I can say that there is a difference between harflatex and lualatex in this aspect.

BTW, why do you need to manually enable ccmp feature? It should be on by default.

Because I am not confident if ccmp is enabled by default in all recent versions of lualatex and harflatex...

khaledhosny commented 5 years ago

LuaTeX is adding useless entries to the the fonts /ToUnicode CMap:

<25D7> <FFFD>                                                                   
<25D8> <FFFD>                                                                   
<25D9> <FFFD>                                                                   
<25DA> <FFFD>                                                                   
<25DB> <FFFD> 

The right side is the (emoji) glyphs and the left side is the Unicode replacement character (οΏ½). That does not seem to be terribly useful, but I can probably do this to silence the validation warning (the result would still be bogus for any application ignoring /ActualText and using the CMap for these glyphs).