Open TarunChakitha opened 3 months ago
Hello @TarunChakitha,
It's because your image doesn't split neatly into pages. You have an image height of 1622346 and a page height of 1650, but 1622346 / 1650 is 988.24, not 925. I would guess that one of the pages in your document is a different size.
You will probably have to process this one page at a time, perhaps (untested):
doc = pyvips.Image.pdfload(file_path)
n_pages = doc.get("n-pages")
pages = [pyvips.Image.pdfload(file_path, n=i, dpi=DPI)
for i in range(n_pages)]
It's a little slower than loading once and then splitting, unfortunately.
Is there no other workaround other than the looping method? Because, the loop method itself was my first approach. But for some reason the azure function that I hosted this code errored out with code 137 after 6 or 7 iterations. And that is happening with equal sized pages also but they are non-digital (scanned image pdfs).
You could open a page at a time and try to find which pages differ in size.
You could also try opening pages in sequential mode, and using a loop rather than a list comprehension. And it depends what you plan to do with the pages once you've loaded them.
Hi @jcupitt,
I am trying to split a many-page image into a list of N separate images.
Code:
output:
Expected:
individual_pages
must contain a list of 925 individual pagesActual:
individual_pages
has only 1 element which same as themulti_page_image
but with a temp filename.I noticed that this is happening with pdfs having the producer given in the output. Rest of the pdfs I tested have a different producer and its working for them.
OS details: only tried testing this with debian 11 docker, ubuntu docker.
lsb_release -a
:uname -a
:Python version
3.10.14
pyvips version:2.2.3
could you please help.