It looks like sbb-binarization uses a lot more memory than it needs, potentially leading the host machine to run out of memory depending on the available RAM and the size of the workflow.
Let k be the number of models used and n be the number of images in the workflow. By looking at the code we can see that:
1) The program instantiates n tensorflow session objects despite needing only 1;
2) The program instantiates kn model objects despite needing only k.
These issues are solved by moving some routines called directly or indirectly by the run method to the __init__ method. I ran memory profilings (valgrind massif) of sbb-binarization both with and without the changes proposed in this PR using the DIBCO11 assets provided by OCR-D and plotted the data with matplotlib to demonstrate (check massif.zip for the actual massif.out files).
The unnecessary memory allocations causes sbb-binarization's RAM usage to gradually increase over time, reaching over 7GB in the end. With this PR, the memory consumption stabilizes around 1GB. The process also takes only 84% of the original time to finish since a lot of instantiation routines are not unnecessarily repeated.
It looks like sbb-binarization uses a lot more memory than it needs, potentially leading the host machine to run out of memory depending on the available RAM and the size of the workflow.
Let
k
be the number of models used andn
be the number of images in the workflow. By looking at the code we can see that:1) The program instantiates
n
tensorflow session objects despite needing only 1; 2) The program instantiateskn
model objects despite needing onlyk
.These issues are solved by moving some routines called directly or indirectly by the
run
method to the__init__
method. I ran memory profilings (valgrind massif) of sbb-binarization both with and without the changes proposed in this PR using the DIBCO11 assets provided by OCR-D and plotted the data with matplotlib to demonstrate (check massif.zip for the actual massif.out files).The unnecessary memory allocations causes sbb-binarization's RAM usage to gradually increase over time, reaching over 7GB in the end. With this PR, the memory consumption stabilizes around 1GB. The process also takes only 84% of the original time to finish since a lot of instantiation routines are not unnecessarily repeated.