Closed mara004 closed 6 months ago
I just updated this PR to include the newer preprocessor/pdf_renderer.py
, but you should really change your API to de-duplicate the code and load the document only once. It doesn't make sense at all to re-load the document in a separate method just to get page count. You may also want to take a look at pypdfium2's documentation; it provides a multi-page renderer with concurrency that may be more suitable for your use case.
Pipfile
and requirements.txt
still need to be updated properly, but I'm not familiar with this form of dependency pinning.
Maybe a project member can finalise this?
@cschenio @buddhawang
@cschenio can you take a look? thanks!
@mara004 thank you for revisiting this, let's see if I can de-dup the PDF loading logic.
Thanks for the response! I'll need to update this PR again. It's quite some time ago that I initially submitted this, and a few things seem outdated now.
I force-pushed a commit that, I hope, nicely restructures rendering. I ran the test suite, which seems to work. Note that I had to replace the expected result because pypdfium2 uses RGB rather than RGBA where possible.
However, it looks like preprocess_multi_page_bundle()
is currently not covered by tests, and I'm not sure how to invoke that function. Could you please check it still works as expected?
I think this is ready for review again.
I think this is ready for review again.
Good to know that, I will take on it lately.
FYI, I am yet planning to release a new major version that will change the rendering API a bit. This will take some time. I plan to update the patch set when pypdfium2 v4 is released.
Coming back to this, I think the rewrite will still take quite some time, so you could also review/merge this before v4 is released and we can then update your code later in a following PR (which will be much smaller than this one).
Hello,
I'm a former maintainer of pypdfium and now co-author of pypdfium2. I noticed that this project is using pypdfium to rasterise PDFs, but it is now deprecated and succeeded by pypdfium2. We have applied several modernisations like platform specific wheel builds, automatic pdfium init/deinit calls and a small, pythonic support model API to facilitate rendering PDFs. pypdfium2 will be updated on a regular basis, while no further releases are planned for pypdfium.
This patch modifies
utils/pdf_renderer.py
to use pypdfium2, with the new support model API. If you wish to keep using the raw PDFium API, this is still possible, too.https://github.com/pypdfium2-team/pypdfium2 https://pypi.org/project/pypdfium2/