yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Improve tiff stream decoding performance #360

Closed KazkiMatz closed 3 years ago

KazkiMatz commented 3 years ago

Hi, thank you for the great gem.

I've found on some PDFs with large embedded TIFF assets,tiff_depredict gets really slow because of the cost of MRI's array appending operation on the buffer unfiltered. The profiler also shows the MRI's GC cost are also dominant here. This PR uses string instead of array for buffering. In my particular case, the processing duration improved 15x (from 30 secs to 2 secs).