veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
277 stars 47 forks source link

PDF passes validation with type 3 fonts that aren't embedded #1458

Open Corvwyn opened 5 months ago

Corvwyn commented 5 months ago

I'm converting some PDF files to PDF/A-1b. After the conversion we use veraPDF to verify that these are valid PDF/A-1b.

In one instance we have a PDF file that veraPDF validates as valid PDF/A-1b. This file contains two fonts called T3Font_0 and T3Font_1, that aren't embedded.

Is this the correct behaviour, or is there something I'm missing? Is there something special about type 3 fonts that doesn't require them to be embedded?

I can provide an example pdf if needed, I just need to ask if it's ok to share first.

THausherr commented 5 months ago

Type 3 fonts are not really fonts, this is a collection of PDF content streams, one per glyph, so them claiming not being embedded might be a misunderstanding.

Corvwyn commented 5 months ago

@THausherr Thanks for the info. In that case, it makes sense that veraPDF validates the PDF this way.

Adobe Acrobat lists them as type 3 fonts that aren't embedded.

The main problem is that the library we use to concatenate these files see them as unembedded fonts. I guess we might have to create a support ticket, so they load type 3 fonts in a different way.

Thanks for the quick reply!

bdoubrov commented 5 months ago

@Corvwyn feel free to upload the files to this issue. I would double check if indeed these fonts are Type3 ones. Font names might be misleading sometimes.

Corvwyn commented 5 months ago

@THausherr Great. I will upload the pdf soon, I just need to ask if it's ok.

Corvwyn commented 5 months ago

@THausherr Here you go. pdfa1b_with_type3fonts.pdf

THausherr commented 5 months ago

Yeah there are type 3 fonts on page 60. And it's like I described.

Btw I found a different problem. VeraPDF claims it is a PDF/A file, PDFBox Preflight claims it isn't. The reason is that the file has /SMask (None) but "None" as a string instead of as a name. One of us is wrong 😂

Corvwyn commented 5 months ago

Hmm. What a predicament 😛

petervwyatt commented 5 months ago

I will also highlight PDF Errata 118 and PDF Errata 6 - words such as "absent" or "present" in all PDF specs are ambiguous and very likely not what is desired. This may be why...

bdoubrov commented 5 months ago

Well, /SMask (None) is a violation of ISO 32000-1. And the Arlington PDF checker does find this as well as a number of other deviations. So, strictly speaking the behaviour of PDF/A validator is undefined.

Currently veraPDF accepts both /None and (None) as permitted values of /SMask entry in the ExtGState dictionary. More correct behaviour would be to report the violation of ISO 32000-1 in the logs (which can also be optionally included into the report) and ignore this entry. The result of PDF/A-1b validation would not change though: the /SMask entry would be treated as not present and thus would comply to the PDF/A-1b requirements.

bdoubrov commented 3 months ago

We've implemented additional object type checks, so that /SMask (None) is reported as a validation error. This fix is available in the latest dev builds.