robertknight / ocrs-models

PyTorch models for the ocrs OCR engine
31 stars 5 forks source link

Add balanced selection of examples for loss computation and remove border masks #5

Closed robertknight closed 1 year ago

robertknight commented 1 year ago

There is a class imbalance in the training images, which typically have many more non-text than text pixels. Account for this by introducing a loss function which selects an equal number of text and non-text pixels to compute the loss from. In my experiments several months ago, this enabled the model to make better predictions near the edges of text words without the use of a border mask to increase the weights of pixels in the borders in those areas. Unfortunately I haven't been rigorous enough in recording metrics to include those here, and I'm revisiting this project after a couple of months doing other things.

This PR also adjusts the binarization threshold used during evaluation to be 0.5, since that is what is used in the loss function in training, and is also the "natural default" for a binarization threshold when dealing with probabilities.