innodatalabs / redstork-ui

Demo app: PDF viewer using redstork PDF backend
MIT License
2 stars 0 forks source link

Confusing crop API #3

Open YinlinHu opened 4 years ago

YinlinHu commented 4 years ago

As I only tested the crop API under the case of rotation==0, After detailed checking, I found that the crop RECT is dependent on the rotation itself, that means the user needs to do the corresponding transform, just as you do: https://github.com/innodatalabs/redstork-ui/blob/2f5f23ac4c0f017cca35b69f686aab1eb6528ef3/ui/view/page_scene_view.py#L153 I recommend hiding this complex thing to user, and making the crop RECT defined on the canvas after rotation.

YinlinHu commented 4 years ago

Maybe this is a bad suggestion. As I found this is the way PDFium works...

mkroutikov commented 4 years ago

Agree that all this is pretty confusing.

My current take is that Page.rotation is a hint that tells the direction of the prevailing text. Page cropbox and rendering is always in the base coordinate system (unaffected by rotation). After rendering, image can be trivially transformed if needed.

Not 100% sure this is the right approach though. Lets see how it plays, can always invent some other rules.

YinlinHu commented 4 years ago

I see, thanks for your clarification. In my understanding, the page properties such as: bbox, crop_box, media_box, rotation can be hidden from users, all they need to know is the visible parts of the PDF (defined by width and height), and all the coordinates system should be defined within this visible region. This is how Poppler and PyMuPDF works. This may make your project less generalizable, while I think working within visible region makes sense for most scenarios.

BTW, I also found a PDF has rotation problems: https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf

YinlinHu commented 4 years ago

@mkroutikov Maybe the raw PDFium API is also acceptable: https://pypi.org/project/pypdfium/

mkroutikov commented 4 years ago

"visible" region of a page is not well-defined. Typically that would be a cropbox. But for printers that has to be a mediabox. Not sure why PDF specs also have Artbox and other weirdness. So, tentatively prefer to stick with native PDF coordinate system as per Adobe specs.

pypdfium is awesome! Will definitely have a closer look.