Closed a20god closed 7 years ago
This might be related to https://github.com/veraPDF/veraPDF-library/issues/834.
This one might be simpler to analyze as it uses a subsetted font:
Again, the document is compliant according to three other PDF/A validators.
I think COSStream.concatenateStreams() is broken, for short content streams it puts lots of NUL characters into the temporary file. writeStreamToFile() ignores the number of bytes returned by stream.read(tmp) (variable "read") and always writes the complete array even if it has been filled only partially. This will become a real problem for content streams which are longer than 2048 bytes as the byte array won't contain NULs for the last iteration..
Example: concat1.pdf
Also test with this one: concat2.pdf
Thank you, that is a severe error indeed.
Note that there is an implicit "token separator" between the streams of the array. concat2.pdf demonstrates that. Inserting a space between streams probably won't work in certain pathological cases.
My content stream parser treats the end of a stream in a Contents array as EOF as far as tokenization is concerned and then moves on to the next stream. That is, it does not really concatenate the streams.
I think that simple concatenation of streams is an appropriate solution as this is exactly what is said in specification.
Well, the specification says
The division between streams may occur only at the boundaries between lexical tokens
Note that PDF 2.0 clarifies how to concatenate:
If the value is an array, the effect shall be as if all of the streams in the array were concatenated with at least one white-space character added between the streams’ data, in order, to form a single stream.
veraPDF 1.7.63 (and older) claims that this document violates 6.2.11.5 of ISO 19005-2:2011:
tmp40.pdf
Three other PDF/A validators believe that the document is compliant.