pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

SMask of Image is not detected #3604

Closed Rodrigodd closed 1 week ago

Rodrigodd commented 1 week ago

Description of the bug

I have a PDF with images using a SMask for transparency, but when I call doc.get_page_images() or similar functions, the smask has xref 0. I believe this happen due to the way SMask is used, it appears it uses a object with a object with draws operations for the mask, and the mask has the size of the entire Page, while the image itself is only 256x256.

The PDF has being created by Inskcape.

How to reproduce the bug

Run the following script:

import fitz
doc = fitz.open('drawing-uncompressed.pdf')

for image in doc.get_page_images(0, full=True):
    print(image)

    xref = image[0]
    pix1 = fitz.Pixmap(doc.extract_image(xref)["image"])
    pix1.save("image.png")

xref = 14
smask = doc.extract_image(xref)["smask"]
pix1 = fitz.Pixmap(doc.extract_image(xref)["image"])
mask = fitz.Pixmap(doc.extract_image(smask)["image"])
pix = fitz.Pixmap(pix1, mask)
pix.save("mask.png")

With the following file:

drawing-uncompressed.pdf

(compressed (before decompressing with qpdf): drawing.pdf; the Inkscape file that generated it: drawing.svg)

The script prints a single image, with smask 0. Looking at the PDF code, the smask of the image should be 10, which refers to 12, which draws 14, which is the image I would expect to be the mask, although it itself has a smask 16.

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.11

JorjMcKie commented 1 week ago

You must be talking about a different file than the attached one - which is broken so that no viewer can display it. Please provide the actual problem file.

Rodrigodd commented 1 week ago

@JorjMcKie Sorry, I may had corrupted the file while inspecting it earlier.

I recreated and uploaded it again, and confirmed it now renders.

JorjMcKie commented 1 week ago

There is no mask in this file!

The /SMask entries in multiple extended graphics state dictionaries are all set to /None, which by the book means no transparency.

Rodrigodd commented 1 week ago

I am not a specialist in PDF, but looking at drawing-uncompressed.pdf, obj 8 0 has /SMask 10 0 R, which is used in the graphic state when drawing the image /s7 gs /x8 Do.

obj 10 0 is of /Type /Mask with /G 12 0, where obj 12 0 is drawing the mask, using the image in obj 14 0.

My reproduction script extracts the mask in xref 14. Also notice my script only work with the decompressed PDF, so I updated my initial post.

6 0 obj            # draw the content of Page 1
<< /Length 47 >>
stream
1 0 0 -1 0 841.889764 cm
q
q /s7 gs /x8 Do Q  # draws the xref 9 which eventually draws the image (9->11->15), with graphic state `8`
Q
endstream
endobj
7 0 obj # declares /s7 as 8 0
<< /ExtGState << /s7 8 0 R >> /XObject << /x8 9 0 R >> >>
endobj
8 0 obj # sets the smask as 10 0
<< /AIS false /CA 1 /SMask 10 0 R /Type /ExtGState /ca 1 >>
endobj
# ...
10 0 obj # defines the mask as 12 0
<< /G 12 0 R /S /Alpha /Type /Mask >>
endobj
12 0 obj # draws the mask
<< /BBox [ 0 0 192 192 ] /Group << /CS /DeviceRGB /I true /S /Transparency /Type /Group >> /Resources << /ExtGState << /a0 << /CA 1 /ca 1 >> /gs0 << /BM /Normal /CA 1.0 /SMask /None /ca 1.0 >> >> /XObject << /x11 14 0 R >> >> /Subtype /Form /Type /XObject /Length 65 >>
stream
q
2250.708661 0 0 -3183.307087 0 3183.307087 cm
/a0 gs /x11 Do # draw xref 14, which is the mask image
Q
endstream
# ...
14 0 obj
<< /BitsPerComponent 8 /ColorSpace /DeviceRGB /Height 1123 /Interpolate true /SMask 16 0 R /Subtype /Image /Type /XObject /Width 794 /Length 2674986 >>
stream
# ...binary data...
JorjMcKie commented 1 week ago

You sent me a different file then ... again! All /SMask in the attached file drawing.pdf are /None.

JorjMcKie commented 1 week ago

took another look at the uncompressed file. This time there are SMask items that are not /None but pointing to dictionaries. These mask definitions are not related to images but defined inside extended graphics state objects. The scope of any such transparency definitions is independent from other objects like images. So they consequently won't show up if you extract images in whatever way.

Rodrigodd commented 1 week ago

Do you know any workaround to identify that that mask is being applied to that image, using pymupdf or any other library? By even traversing a raw representation of the PDF, for example.

JorjMcKie commented 1 week ago

You can use the MuPDF CLI tool to create an XML file representing the page appearance source: mutool trace input.pdf > input.xml.

In this case, the output is

<document filename="drawing-uncompressed.pdf">
<page number="1" mediabox="0 0 595.276 841.89">
<set_default_colorspaces gray="DeviceGray" rgb="DeviceRGB" cmyk="DeviceCMYK" oi="None"/>
<group bbox="0 0 595.2756 841.8898" isolated="1" knockout="0" blendmode="Normal" alpha="1">
    <clip_mask bbox="0 0 192 192" s="alpha" ri="1" bp="1" op="0" opm="0">
        <group bbox="0 0 192 192" isolated="1" knockout="0" blendmode="Normal" alpha="1">
            <clip_path winding="nonzero" transform="1 0 0 1 0 0">
                <moveto x="0" y="0"/>
                <lineto x="192" y="0"/>
                <lineto x="192" y="192"/>
                <lineto x="0" y="192"/>
                <closepath/>
            </clip_path>
                <clip_image_mask transform="2250.7088 0 -0 3183.3072 0 0" width="794" height="1123"/>
                    <fill_image alpha="1" colorspace="DeviceRGB" ri="1" bp="1" op="0" opm="0" transform="2250.7088 0 -0 3183.3072 0 0" width="794" height="1123"/>
                <pop_clip/>
            <pop_clip/>
        </group>
    </clip_mask>
        <group bbox="0 0 192 192" isolated="1" knockout="0" blendmode="Normal" alpha="1">
            <clip_path winding="nonzero" transform="1 0 0 1 0 0">
                <moveto x="0" y="0"/>
                <lineto x="192" y="0"/>
                <lineto x="192" y="192"/>
                <lineto x="0" y="192"/>
                <closepath/>
            </clip_path>
                <clip_path winding="nonzero" transform="1 0 0 1 0 0">
                    <moveto x="0" y="0"/>
                    <lineto x="596" y="0"/>
                    <lineto x="596" y="842"/>
                    <lineto x="0" y="842"/>
                    <closepath/>
                </clip_path>
                    <fill_image alpha="1" colorspace="DeviceRGB" ri="1" bp="1" op="0" opm="0" transform="191.99999 0 -0 191.99999 0 0" width="256" height="256"/>
                <pop_clip/>
            <pop_clip/>
        </group>
    <pop_clip/>
</group>
</page>
</document>

A bit crowded at first sight 🤷‍♂️. After a while, we see that the page is under control of a so-called "group". Note the alpha="1" parameter. Under that group follows a multi-level hierarchy specification of more groups, clips, etc. Maybe that helps ...