There is a class imbalance in the training images, which typically have many more non-text than text pixels. Account for this by introducing a loss function which selects an equal number of text and non-text pixels to compute the loss from. In my experiments several months ago, this enabled the model to make better predictions near the edges of text words without the use of a border mask to increase the weights of pixels in the borders in those areas. Unfortunately I haven't been rigorous enough in recording metrics to include those here, and I'm revisiting this project after a couple of months doing other things.
This PR also adjusts the binarization threshold used during evaluation to be 0.5, since that is what is used in the loss function in training, and is also the "natural default" for a binarization threshold when dealing with probabilities.
There is a class imbalance in the training images, which typically have many more non-text than text pixels. Account for this by introducing a loss function which selects an equal number of text and non-text pixels to compute the loss from. In my experiments several months ago, this enabled the model to make better predictions near the edges of text words without the use of a border mask to increase the weights of pixels in the borders in those areas. Unfortunately I haven't been rigorous enough in recording metrics to include those here, and I'm revisiting this project after a couple of months doing other things.
This PR also adjusts the binarization threshold used during evaluation to be 0.5, since that is what is used in the loss function in training, and is also the "natural default" for a binarization threshold when dealing with probabilities.