pts / pdfsizeopt

PDF file size optimizer
GNU General Public License v2.0
750 stars 65 forks source link

PDF/A compliance #148

Open Keks-Dose opened 3 years ago

Keks-Dose commented 3 years ago

I have been using pdfsizeopt for as long as it exists. Great software, shrinks my PDFs (produced by pdftex) by factor 10.

Times are changing. It seems that I'll have to provide pdfs according to standards like PDF/A.

Like said here, a »PDF/A-1b is a PDF with an OutputIntent and certain metadata.«

So this is my MWE:

    \documentclass{article}
    \usepackage[T1]{fontenc}

    \usepackage{libertine}

    \usepackage[pdfa]{hyperref}
    \usepackage{hyperxmp}

    \hypersetup{%
      pdftitle={Irgendeine Angabe},
      pdfauthor={Der Name},
      pdflang={de-DE},
      pdfapart=3, %set to 1 for PDF/A-1
      pdfaconformance=B
    }

    % Create an OutputIntent in order to correctly specify colours
    \immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
    \pdfcatalog{%
      /OutputIntents [
      <<
      /Type /OutputIntent
      /S /GTS_PDFA1
      /DestOutputProfile \the\pdflastobj\space 0 R
      /OutputConditionIdentifier (sRGB)
      /Info (sRGB)
      >>
      ]
      /ViewerPreferences
      <</PrintScaling/None>>
    }

    \begin{document}

    And some text.

    \end{document}

The PDF is, according to veraPDF, compliant to PDF/A-3B.

But no longer after running pdfsizeopt on it, even if the command is pdfsizeopt --do-unify-fonts=no --do-regenerate-all-fonts=no --do-optimize-images=no MWE-pdfsizeopt-pdfA.pdf MWE-pdfsizeopt-pdfA.pdf

veraPDF reports errors:

The stream keyword shall be followed either by a CARRIAGE RETURN (0Dh) and LINE FEED (0Ah) character sequence or by a single LINE FEED (0Ah) character. The endstream keyword shall be preceded by an EOL marker.

The object number and generation number shall be separated by a single white-space character. The generation number and obj keyword shall be separated by a single white-space character. The object number and endobj keyword shall each be preceded by an EOL marker. The obj and endobj keywords shall each be followed by an EOL marker.

The file trailer dictionary shall contain the ID keyword whose value shall be File Identifiers as defined in ISO 32000-1:2008, 14.4

Usually only the first error is important.

If somebody has an idea how to deal with these errors, I'd be glad. Otherwise I'll have to provide larger PDFs, obviously not triggering the end of the world.

This isn't a complain, I doubt that it even is an issue of pdfsizeopt. But maybe there is an easy remedy, or somebody else stumbles upon this issue.

zvezdochiot commented 3 years ago

@Keks-Dose say:

But maybe there is an easy remedy, or somebody else stumbles upon this issue.

See #119 .

Keks-Dose commented 3 years ago

@Keks-Dose say:

But maybe there is an easy remedy, or somebody else stumbles upon this issue.

See #119 .

Could you elaborate a bit, please? cpdf doesn't compress much, even with the option --squeeze. File size is not about 10%, more about 90%.

zvezdochiot commented 3 years ago

@Keks-Dose say:

Could you elaborate a bit, please?

You are using cpdf incorrectly, not for its intended purpose. Take a closer look at your own question.

Keks-Dose commented 3 years ago

@Keks-Dose say:

Could you elaborate a bit, please?

You are using cpdf incorrectly, not for its intended purpose. Take a closer look at your own question.

? No idea what you mean, sorry.

zvezdochiot commented 3 years ago

@Keks-Dose say:

No idea what you mean.

Cpdf is needed to fix the processed using pdfsizeopt file.

Keks-Dose commented 3 years ago

So you basically suggest:

  1. compile *.tex to in.pdf
  2. pdfsizeopt in.pdf
  3. cpdf -create-objstm -no-preserve-objstm in.pdf -o out.pdf

I tried all that, but in vain: PDF file is not compliant to PDF/A.

zvezdochiot commented 3 years ago

https://github.com/qpdf/qpdf

Keks-Dose commented 3 years ago

https://github.com/qpdf/qpdf

Well, not cpdf, but qpdf now.

You haven't got a clue, what you are talking about, do you? The manual of qpdf offers thousands of commands, many really low level. Sorry, your comments are not helpfull at all.

zvezdochiot commented 3 years ago

@Keks-Dose say:

You haven't got a clue, what you are talking about, do you?

It's enough for me to understand the essence. You want to "fix" the processed PDF. Find the right tool.

pts commented 1 year ago

@Keks-Dose: Thank you for reporting the issue and suggesting veraPDF to verify PDF/A compliance. It would be easy to add the command-line flag pdfsizeopt --write-pdfa=yes, which would fix the compliance issues above.

I'm keeping this issue open in case anyone volunteers to implement it.

pts commented 1 year ago

@Keks-Dose: Could you please upload your input PDF to this issue?