qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
67 stars 14 forks source link

use predict_generator to better utilize GPU #32

Open bertsky opened 3 years ago

bertsky commented 3 years ago

When the model is applied in patch mode (the default), a loop over the windows is run (on CPU / in Numpy) and passed to model.predict() as a single image each (on GPU / in Keras).

https://github.com/qurator-spk/sbb_binarization/blob/8dd05064b2dbdc7d4bdfb8896251302e8ec5ecb3/sbb_binarize/sbb_binarize.py#L152

This does not utilize the GPU for two reasons:

  1. the effective batch size of 1 might be too low for the number of shaders and size of GPURAM
  2. the GPU kernel can only run briefly and has to wait for the CPU each time (patch cropping and memory paging)

I suggest changing the following:

bertsky commented 3 years ago

Spoiler: I know how to do this. Would you care for a PR?

vahidrezanezhad commented 3 years ago

Spoiler: I know how to do this. Would you care for a PR?

@bertsky I appreciate it if you do that :)

apacha commented 1 year ago

@bertsky did you ever complete this improvement? Maybe on a fork? I would like to run this binarization on a large dataset and with the current procedure it is simply too slow (10-20 images per minute).