veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

Crash #1400

Closed 10b14224cc closed 3 months ago

10b14224cc commented 7 months ago

paper_pdfa.pdf

The above PDF makes verapdf crash:

❯ verapdf --version
veraPDF 1.24.1
Built: Thu Jun 22 14:19:00 CEST 2023
Developed and released by the veraPDF Consortium.
Funded by the PREFORMA project.
Released under the GNU General Public License v3
and the Mozilla Public License v2 or later.
❯ verapdf --format text paper_pdfa.pdf
Jan 24, 2024 3:38:57 PM org.verapdf.pd.font.cmap.CMapFactory getCMap
WARNING: Can't parse CMap Adobe-Identity-UCS2, using default
java.io.IOException: Stream in NotSeekableBaseParser can't be null.
    at org.verapdf.parser.NotSeekableBaseParser.<init>(NotSeekableBaseParser.java:67)
    at org.verapdf.parser.NotSeekableBaseParser.<init>(NotSeekableBaseParser.java:81)
    at org.verapdf.parser.NotSeekableCOSParser.<init>(NotSeekableCOSParser.java:59)
    at org.verapdf.parser.postscript.PSParser.<init>(PSParser.java:46)
    at org.verapdf.pd.font.cmap.CMapParser.<init>(CMapParser.java:59)
    at org.verapdf.pd.font.cmap.CMapFactory.getCMap(CMapFactory.java:59)
    at org.verapdf.pd.font.cmap.PDCMap.getCMapFile(PDCMap.java:126)
    at org.verapdf.pd.font.cmap.PDCMap.getCMapFile(PDCMap.java:110)
    at org.verapdf.pd.font.PDType0Font.toUnicode(PDType0Font.java:153)
    at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.getToUnicodePDFA1(GFGlyph.java:155)
    at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.<init>(GFGlyph.java:111)
    at org.verapdf.gf.model.impl.operator.textshow.GFCIDGlyph.<init>(GFCIDGlyph.java:42)
    at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.getGlyph(GFGlyph.java:132)
    at org.verapdf.gf.model.impl.operator.textshow.GFOpTextShow.getUsedGlyphs(GFOpTextShow.java:162)
    at org.verapdf.gf.model.impl.operator.textshow.GFOpTextShow.getLinkedObjects(GFOpTextShow.java:118)
    at org.verapdf.gf.model.impl.operator.textshow.GFOpStringTextShow.getLinkedObjects(GFOpStringTextShow.java:58)
    at org.verapdf.pdfa.validation.validators.BaseValidator.addAllLinkedObjects(BaseValidator.java:232)
    at org.verapdf.pdfa.validation.validators.BaseValidator.checkNext(BaseValidator.java:199)
    at org.verapdf.pdfa.validation.validators.BaseValidator.validate(BaseValidator.java:144)
    at org.verapdf.pdfa.validation.validators.BaseValidator.validate(BaseValidator.java:108)
    at org.verapdf.processor.ProcessorImpl.validate(ProcessorImpl.java:248)
    at org.verapdf.processor.ProcessorImpl.process(ProcessorImpl.java:124)
    at org.verapdf.processor.BatchFileProcessor.processItem(BatchFileProcessor.java:152)
    at org.verapdf.processor.BatchFileProcessor.processList(BatchFileProcessor.java:85)
    at org.verapdf.processor.AbstractBatchProcessor.process(AbstractBatchProcessor.java:102)
    at org.verapdf.cli.VeraPdfCliProcessor.processFilePaths(VeraPdfCliProcessor.java:142)
    at org.verapdf.cli.VeraPdfCliProcessor.processPaths(VeraPdfCliProcessor.java:103)
    at org.verapdf.cli.VeraPdfCli.singleThreadProcess(VeraPdfCli.java:143)
    at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:111)
    at org.verapdf.apps.GreenfieldCliWrapper.main(GreenfieldCliWrapper.java:54)
bdoubrov commented 7 months ago

Thanks for reporting this issue. This crash is log means that the Unicode mapping is not available for one of the fonts and is handled internally with the correct generation of the validation report.

But I do agree this looks a bit confusing. We have refactored this part to avoid java.io.IOException and non-informative logs. The changes are available in the latest dev version and will be included in the next release.

MaximPlusov commented 3 months ago

Included into release 1.26