Open martholomew opened 11 months ago
Dear @martholomew,
Of course, Pseudo-labeling can be effective, and we have also utilized this technique to enhance our models. You can employ https://github.com/qurator-spk/sbb_pixelwise_segmentation for your training needs. Initially, you can use our models to binarize your dataset and subsequently choose the documents with satisfactory results for custom dataset training. Sometimes, the predictions may exhibit local excellence. In such cases, you can employ cropping to prepare your ground truth (GT).
I would like to fine-tune the model towards the data that I will be feeding it. My pipeline would be to binarize the images using sbb_binarize, then manually edit them to be high-quality ground-truth, then feed a large amount of these images back into the model.
A link to point me in the right direction would be a great help.