veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
276 stars 48 forks source link

Question regarding "unknown decode filter" #1481

Open Lolf1010 opened 2 weeks ago

Lolf1010 commented 2 weeks ago

Hi everyone,

at work we use veraPDF to validate pdf files uploaded by our users of our platform. We have seen the log-message "Unknown decode filter" a couple of times recently. The log comes from org.verapdf.cos.filters.COSFilterRegistry#getDecodeFilter, see here.

Questions Is it possible to add the name of the filter to the log-message so that we can see which unsupported filter was used?

Are there any known filters that are currently not supported by veraPDF? I have found a couple online that are not in the registry (RunLengthDecode, JBIG2Decode, DCTDecode) but they did not cause the log-message. I probably did not use them correctly or, even more likely, dont understand that part of pdf-files completely ;)

And, since the log-level is SEVERE: should we report unsupported filters to you so they can be added or can they be ignored when validating a pdf-file?

Thanks.

bdoubrov commented 1 week ago

Hi @Lolf1010

Is it possible to add the name of the filter to the log-message so that we can see which unsupported filter was used?

Thanks for the suggestion. We've added this extra info in the latest dev build.

veraPDF does not decode images and as such does not support image-specific filters: RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode.

Normally, such filters are not used for any data parsed by veraPDF. So, this log message on "Unknown decode filter" looks a somewhat suspicious. It would be great if you could provide any sample document where you've seen this message.

Lolf1010 commented 1 week ago

Thanks for the feedback and adjustment. Sadly i cannot provide such file as of today. The files that are uploaded by our users are private. I will continue to try and find a file that causes the log message. It may require some weird data - who knows what kind of stuff people put in their pdfs^^

I think for now the issue can be closed. When i have found such file i may create a new issue specifically for that filter.