Feature request: please, allow parallelization of external commands

lcorbasson / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt

0 stars 0 forks source link

Some PDF files, especially those that were created by scanning a document, take a lot of time to be processed by pdfsizeopt, since each file is compressed one after the other by pngout etc. If pdfsizeopt could use a program like GNU parallel (already in Debian and in Ubuntu) to "dispatch" the external commands, we could have a tremendous speedup on computers with more than 1 thread/core/etc. If using another script like GNU parallel is not desired (that would avoid reinventing the wheel, at least for the short term), then perhaps using a thread pool for the external commands could prove useful. Thanks.

Thank you for coming up with this idea.

Implementing this feature is not as easy as it sounds. It would need a major 
redesign of how pdfsizeopt processes the object in the input PDF. The 
reimplementation would be error-prone, and lots of concurrency bugs would need 
to be diagnosed and fixed.

I think I'll postpone this because of the lack of free time and motivation on 
my part.

Another similar improvement: for each slow conversion step, save the SHA-1 hash 
of the input, and save the output to a cache directory, so if pdfsizeopt is run 
again on the same input, it will reuse the output from the cache. This is much 
easier to implement, but still a lot of work.

Original comment by pts...@gmail.com on 27 Feb 2013 at 9:46

Added labels: Priority-Medium, Type-Enhancement
Removed labels: Priority-High, Type-Defect

lcorbasson / pdfsizeopt

Feature request: please, allow parallelization of external commands #78