ouspg / cse-dtyo

MIT License
0 stars 9 forks source link

Template generated PDF is not PDF/A-1B compatible #6

Open joniumGit opened 1 year ago

joniumGit commented 1 year ago

Major issues with PDF/A-1B compatibility preventing thesis submission

According to the Thesis Instructions for Oulu University the thesis PDF must be PDF/A-1B compatible:

There are many different versions of PDF/A format. Which one should I choose, so that Laturi will accept my thesis? Laturi requires that theses conform to the PDF/A-1b format. This means that files in PDF/A-1a format are also accepted, as its requirements are even more extensive. Laturi will not accept files in any versions of the PDF/A-2 and PDF/A-3 formats.

However, this template does not produce PDF/A-1B compatible PDFs if compiled in Overleaf using pdfLaTeX (or any other compiler).

As this is a template for thesis documents this template should pass the PDF/A-1B validation so that it can be submitted into the thesis system.

Note: As this template is not in sync with the Overleaf template linked in the university thesis guides I have not put the changes in a Pull Request. If this repo is updated with the latest version I can submit a pull request for these changes. Also, overleaf now allows syncing directly from Github: https://www.overleaf.com/learn/how-to/Git_Integration_and_GitHub_Synchronization

Issues in Validation

Here are the verification results using veraPDF to validate ith PDF/A-1B profile:

Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:01.006
Total rules in Profile: 101
Passed Checks: 86437
Failed Checks: 2312

The following checks fail:

  1. Specification: ISO 19005-1:2005, Clause: 6.2.3, Test number: 2
  2. Specification: ISO 19005-1:2005, Clause: 6.2.3, Test number: 3
  3. Specification: ISO 19005-1:2005, Clause: 6.2.3, Test number: 4
  4. Specification: ISO 19005-1:2005, Clause: 6.4, Test number: 2
  5. Specification: ISO 19005-1:2005, Clause: 6.5.3, Test number: 2
  6. Specification: ISO 19005-1:2005, Clause: 6.5.3, Test number: 3
  7. Specification: ISO 19005-1:2005, Clause: 6.7.2, Test number: 1
  8. Specification: ISO 19005-1:2005, Clause: 6.7.3, Test number: 1

Using trial and error and https://webpages.tuni.fi/latex/pdfa-guide.pdf the changes in the next chapters were made. The xmpdata overriding was excluded from this issue, but it should be useful for defining PDF metadata for thesis documents (see previous link).

Fixing the Template

1. Add \usepackage[a-1b,mathxmp]{pdfx} to main.tex

This alone makes the situation much better:

Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:00.374
Total rules in Profile: 101
Passed Checks: 89911
Failed Checks: 144

The following still fail:

  1. Specification: ISO 19005-1:2005, Clause: 6.2.3, Test number: 3
  2. Specification: ISO 19005-1:2005, Clause: 6.4, Test number: 2
  3. Specification: ISO 19005-1:2005, Clause: 6.5.3, Test number: 2

2. Fix package hyperref by setting the import to \usepackage[pdfa]{hyperref} in di.sty

Now only a few errors remain:

Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:00.270
Total rules in Profile: 101
Passed Checks: 90274
Failed Checks: 23

The following still fail:

  1. Specification: ISO 19005-1:2005, Clause: 6.2.3, Test number: 3
  2. Specification: ISO 19005-1:2005, Clause: 6.4, Test number: 2

3. Replacing the university logo

As the logo is in CMYK format it does not pass the PDF/A-1B validation. Replacing the logo with the ones from https://www.sttinfo.fi/uutishuone/oulun-yliopisto/m?publisherId=57858920, namely the Finnish logo in RGB PNG and English logo in RGB PNG.

This also requires minor adjustments in the di.sty file for including the correct logo for a language and also requires removing transparency from the logo. Transparency removal step was done with ImageMagic using:

convert unilogo.png -background white -alpha remove -alpha off unilogo_nobg.png

Leading to the following:

Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:00.246
Total rules in Profile: 101
Passed Checks: 89473
Failed Checks: 1

This only leaves us with one last error:

  1. Specification: ISO 19005-1:2005, Clause: 6.4, Test number: 2

4. Removing the background from WP.png

This image has a transparent background and it was removed with the same ImageMagic command as the logo.

Resulting PDF Passes PDF/A-1B validation

Compliance: Passed
Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:00.239
Total rules in Profile: 101
Passed Checks: 89445
Failed Checks: 0
joniumGit commented 1 year ago

If the template wants to be PDF/A-1A compatible it needs to also solve the following:

Version: 1.22.3
Parser: GreenField
Build Date: 2022-09-14T14:50:00+03:00
Processing time: 00:00:01.164
Total rules in Profile: 106
Passed Checks: 89785
Failed Checks: 3
  1. Specification: ISO 19005-1:2005, Clause: 6.7.11, Test number: 3 (fixed by setting pdfx to 1a)
  2. Specification: ISO 19005-1:2005, Clause: 6.8.2, Test number: 1
  3. Specification: ISO 19005-1:2005, Clause: 6.8.3, Test number: 1

However, I do not think it is required or possible at the moment.