Closed gmischler closed 5 months ago
Thank you for the detailed report @gmischler!
I made some tests this morning:
test/svg/svg_sources/cubic02.svg
5.1KB - KOtest/svg/svg_sources/SVG_logo.svg
4.2KB - KOtest/svg/svg_sources/arcs02.svg
2.3KB - KOtest/svg/svg_sources/Ghostscript_escher.svg
297KB - OKtest/svg/svg_sources/Ghostscript_colorcircle.svg
139KB - OKtest/svg/svg_sources/cubic01.svg
1.8KB - OKtest/svg/svg_sources/quad01.svg
1.1KB - OKtest/svg/svg_sources/arcs01.svg
889B - OKpdf.compress = False
makes the problem disappear!It's probably not something in the SVG data itself, but in how it interacts with compression. Adding the same SVG several times causes a lot of repetition in the text (they end up identical except for the placement/scaling transform), resulting in a very high compression ratio. Apparently we're not handling that situation in exactly the way as the acrobat reader expects.
I've found that some other software sometimes adds a "Length1" value to content streams. By the specs this is only meant (and mandatory) for compressed font data, where it gives the uncompressed size of the data. I experimented with adding that to the content stream of my example file, but didn't see any change in behaviour. Given that it is off-spec, that isn't really a surprise, but it was worth a shot.
Acrobat reader seems to issue (or not) those warnings depending on arbitrary criteria (including the Windows version, according to some reports). So it may well be that there's something in our use of compression it generally doesn't like, but only complains about when the compression rate is particularly high.
In fpdf2
, PDF pages are compressed using /FlateDecode
implemented with zlib.compress()
:
https://github.com/py-pdf/fpdf2/blob/2.7.6/fpdf/syntax.py#L200
Have you tried displaying zlib.ZLIB_VERSION
& zlib.ZLIB_RUNTIME_VERSION
? Maybe this issue could be related to the version of the underlying zlib
library used?
I'd be curious to know if this could problem happens with other PDF readers... Adobe Acrobat Reader being closed-source, it won't be easy to figure what is the root problem...
I have been digging a little deeper into the resulting zlib compressed streams, but could not find much...
import zlib
from fpdf import FPDF
from pypdf import PdfReader
for svg_file in ("test/svg/svg_sources/arcs01.svg", "test/svg/svg_sources/arcs02.svg"):
print(svg_file)
pdf = FPDF()
pdf.add_page()
pdf.image(svg_file, w=30, h=30)
pdf.image(svg_file, w=30, h=30)
pdf.image(svg_file, w=30, h=30)
pdf.output("issue_960.pdf")
reader = PdfReader("issue_960.pdf")
compressed_stream = reader.pages[0]["/Contents"]._data
# cf. https://www.rfc-editor.org/rfc/rfc1950
cmf, flg = compressed_stream[0], compressed_stream[1]
print(f"* cmf=0x{cmf:X} flg=0x{flg:X}") # 0x78 0x9C => zlib: Default Compression
decompressor = zlib.decompressobj(wbits=zlib.MAX_WBITS)
decompressed_data = decompressor.decompress(compressed_stream)
print(f"* length of decompressed data: {len(decompressed_data)} bytes")
print(f"* compression ratio: {100*len(compressed_stream)/len(decompressed_data):.2f}%")
print(f"* end of the compressed data stream reached? {decompressor.eof=}")
print(f"* {decompressor.unconsumed_tail=}")
print(f"* {decompressor.unused_data=}")
print()
Output:
test/svg/svg_sources/arcs01.svg
* cmf=0x78 flg=0x9C
* length of decompressed data: 2585 bytes
* compression ratio: 17.45%
* end of the compressed data stream reached? decompressor.eof=True
* decompressor.unconsumed_tail=b''
* decompressor.unused_data=b''
test/svg/svg_sources/arcs02.svg
* cmf=0x78 flg=0x9C
* length of decompressed data: 7808 bytes
* compression ratio: 4.85%
* end of the compressed data stream reached? decompressor.eof=True
* decompressor.unconsumed_tail=b''
* decompressor.unused_data=b''
The compression ratio of the smallest "problematic" SVG file (test/svg/svg_sources/arcs02.svg
) is lower than test/svg/svg_sources/arcs01.svg
which does not cause any problem, so it's not simply a matter of this ratio being "too high".
You are right @gmischler, this problems really seems correlated with a high compression ratio being used:
Compression ratio for test/svg/svg_sources/Ghostscript_escher.svg (OK): 29.12%
Compression ratio for test/svg/svg_sources/Ghostscript_colorcircle.svg (OK): 33.11%
Compression ratio for test/svg/svg_sources/cubic01.svg (OK): 9.62%
Compression ratio for test/svg/svg_sources/quad01.svg (OK): 11.95%
Compression ratio for test/svg/svg_sources/arcs01.svg (OK): 17.45%
Compression ratio for test/svg/svg_sources/cubic02.svg (KO): 7.66%
Compression ratio for test/svg/svg_sources/SVG_logo.svg (KO): 6.07%
Compression ratio for test/svg/svg_sources/arcs02.svg (KO): 4.85%
I suspect that Adobe Acrobat Reader decompression function is implemented a bit like that, for "safety" reasons:
import zlib
def acrobat_decompress(compressed_data, growth_max=12):
max_length = len(compressed_data) * growth_max
decompressor = zlib.decompressobj()
decompressed_data = decompressor.decompress(compressed_data, max_length=max_length)
if not decompressor.eof:
raise RuntimeError(f"Uncompressed content is at least {growth_max} times bigger than compressed data")
return decompressed_data
Of course, len(compressed_data) * 12
is just a guess, who knows what the actual implementation sets as the limit...
I made some extra tests with several source SVG files:
9.26%
, Adobe Acrobat Reader produced the error message9.28%
, Adobe Acrobat Reader did not produce any error message9.30%
, Adobe Acrobat Reader produced the error message9.35%
, Adobe Acrobat Reader produced the error message9.42%
, Adobe Acrobat Reader produced the error message9.46%
, Adobe Acrobat Reader produced the error message9.48%
, Adobe Acrobat Reader did not produce any error message9.51%
, Adobe Acrobat Reader did not produce any error message9.54%
, Adobe Acrobat Reader produced the error message9.55%
, Adobe Acrobat Reader did not produce any error message9.80%
, Adobe Acrobat Reader did not produce any error messageSo it's not just a maximum ratio that is taken in consideration by Acrobat...
Maybe fpdf2
should produce a warning when a content stream is compressed with a compression ratio lower than 10%?
Zlib comes with Python. My 3.10 installation uses 1.2.11, but I doubt that this makes any difference in the output.
A warning from fpdf2 seems a bit pointless as long as we don't know what the problem is. What is the user supposed to do with it?
Do all the affected files contain SVG data? I've tried to reproduce the error with other repetitive content subject to high compression, with no success. So it could still be some subtlety in the graphics commands, which acrobat only complains about under certain arbitrary circumstances.
It would really be helpful if soeone with Acrobat Pro could run those files through the preflight function. If the problem is real (and not just a viewer bug), that would give us the information directly from the horses mouth.
When I use Acrobat, I get the same error when printing a PDF. The only requirement is that there is a "path" in the code.
Minimal test code:
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
with pdf.new_path() as path:
path.move_to(1, 1)
path.line_to(9, 9)
path.close()
pdf.output("test.pdf")
Then print test.pdf using Acrobat reader. The error should appear right after printing. test.pdf
The problem persists when pdf.compress = False
When I use Acrobat, I get the same error when printing a PDF. The only requirement is that there is a "path" in the code.
I think this is a different problem, so I moved your comment into a dedicated issue 🙂
Different problem, same workaround -> #1144 also fixes this one. Just comment these lines https://github.com/py-pdf/fpdf2/edit/master/fpdf/drawing.py#L1448-L1454 Results: acro-svg-workaround.pdf acro-svg-err.pdf
I was having exactly the same issue when using SVGs. It was not 100% reproducible, and happening rarely.. I tried locally the fix in #1145 and so far in my testing I haven't seen the issue again.
Is there an ETA to land 2.7.9
on pypi?
Is there an ETA to land
2.7.9
on pypi?
If @gmischler & @andersonhc agree, I think we could perform a new release this month! 🙂
While implementing "image paragraphs" for text regions, Acrobat reader suddenly started complaining about my test file: Of course they want you to buy their other software to create PDFs, so the message is deliberately unhelpful.
Error details
I could boil it down to sections containing imported SVG data. Strangely it takes a certain amount of data until the error triggers. With the SVG logo, it either takes three of them on one page, or two and a bunch of text (at least that are the combinations I found). None of the other viewers and validators that I have easy access to indicate any errors.
Processing the file with qpdf and "--normalize-content=y" (or "--qdf") fixes the problem. But I was unable to glean any useful information from a comparison. I've seen reports that Adobe Preflight gives useful and detailed error reports. So if anyone has that available, it might lead us somewhere.
Minimal code
(for some reason, github doesn't want me to include PDF files here...)
Environment
fpdf2
version used: current HEAD