Closed proofconstruction closed 2 years ago
poppler_utils
came in handy: for i in $(ls); do pdftoppm $i $i -png; done;
On a whim I ran ls | grep pdf | time parallel pdftoppm {} 300dpi/{} -png -r 300
and got the following:
parallel pdftoppm {} 300dpi/{} -png -r 300 7053.63s user 38.89s system 2121% cpu 5:34.29 total
I love GNU Parallel. 3.6GB of images generated in just a few minutes.
@kwcckw Can you add a bash script to the repo that performs this action given an input directory? We can note the pdftoppm
can be installed in various ways, but apt-get update && apt-get install -y poppler-utils
is how one would do it on Ubuntu. It won't work for everyone, but it would provide a starting point and someone could commit a Windows way of doing this later if they wanted to.
@kwcckw Can you add a bash script to the repo that performs this action given an input directory? We can note the
pdftoppm
can be installed in various ways, butapt-get update && apt-get install -y poppler-utils
is how one would do it on Ubuntu. It won't work for everyone, but it would provide a starting point and someone could commit a Windows way of doing this later if they wanted to.
Alright, i will add a script to convert pdf to png in both linux and window later. I can test for both window and linux OS in paperspace, except for mac OS.
I added the python code in this pull request https://github.com/sparkfish/shabby-pages/pull/73 to convert pdf into images. I tested it and it should work in both window and linux OS, and it can be run from the terminal too.
We now have 600 "born-digital" source PDFs, many with multiple pages. We need to split these out into PDFs of each individual page, then convert all pages to PNG.