lcorbasson / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt
0 stars 0 forks source link

Feature request: please, allow parallelization of external commands #78

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Some PDF files, especially those that were created by scanning a document, take 
a lot of time to be processed by pdfsizeopt, since each file is compressed one 
after the other by pngout etc.

If pdfsizeopt could use a program like GNU parallel (already in Debian and in 
Ubuntu) to "dispatch" the external commands, we could have a tremendous speedup 
on computers with more than 1 thread/core/etc.

If using another script like GNU parallel is not desired (that would avoid 
reinventing the wheel, at least for the short term), then perhaps using a 
thread pool for the external commands could prove useful.

Thanks.

Original issue reported on code.google.com by rbr...@gmail.com on 26 Feb 2013 at 11:19

GoogleCodeExporter commented 8 years ago
Thank you for coming up with this idea.

Implementing this feature is not as easy as it sounds. It would need a major 
redesign of how pdfsizeopt processes the object in the input PDF. The 
reimplementation would be error-prone, and lots of concurrency bugs would need 
to be diagnosed and fixed.

I think I'll postpone this because of the lack of free time and motivation on 
my part.

Another similar improvement: for each slow conversion step, save the SHA-1 hash 
of the input, and save the output to a cache directory, so if pdfsizeopt is run 
again on the same input, it will reuse the output from the cache. This is much 
easier to implement, but still a lot of work.

Original comment by pts...@gmail.com on 27 Feb 2013 at 9:46