jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Extracting Z-Value of Rects/Items #700

Closed JosefJoubert closed 2 years ago

JosefJoubert commented 2 years ago

Hi,

First off, thanks for your work. This library greatly useful to me.

I am importing a PDF which represents a floorplan for a building, which contains a lot of graphical objects. Some of these objects are filled rectangles/curves. When I view the PDF in other software, it consistently draws some rectangles over others, so I assume there is some data concerning the Z-value/draw-order of the rectangles. However, I have not found a way to extract this data with pdfplumber.

Is this functionality available and I'm just missing it? If not, could it be added?

Thank you for your time.

jsvine commented 2 years ago

Hi @JosefJoubert, and glad to hear that this library has been helpful. To answer your question: Unlike some other layout systems (e.g., that used by CSS), PDFs don't have a concept of z-index. Instead, graphical elements appear bottom-to-top in the order they are generated in the file. For more information on this, and advice on how to determine the relative order, see this discussion: https://github.com/jsvine/pdfplumber/discussions/694