monniert / docExtractor

(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
https://www.tmonnier.com/docExtractor
MIT License
85 stars 10 forks source link

Parallel Prediction? #22

Closed RealNicolasBourbaki closed 1 year ago

RealNicolasBourbaki commented 1 year ago

Hi @monniert,

Good day :)

I'm using docExtractor in my project to extract textlines. Problem is, I have a LOT of pages. Therefore I'm planning on using multiprocessing to do it in parallel. Is it already an option in this project? Or do you have any suggestions on improving the efficiency of such extraction for me?

Thanks a lot!

Best, Nicole

monniert commented 1 year ago

Hi there, sorry for the late reply and thanks for the interest in the project :)

Sadly no, there is no support for multi-processing, but it should be easy to do using standard torch/python multiprocessing routines. Launching multiple threads on a single GPU is the most straightforward way to do (see https://docs.python.org/3/library/multiprocessing.html) and you should be able to fit 3-4 forward pass simultaneously. Another (much cleaner) way is to perform the segmentation predictions by batch of images: this would require a bit more work to rewrite the for loop on images into a for loop on batchs and adapt the segmentation prediction function

Let me know if you have any trouble, thanks! Tom