qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
72 stars 14 forks source link

Training (or Fine-Tuning) the Model #64

Open martholomew opened 11 months ago

martholomew commented 11 months ago

I would like to fine-tune the model towards the data that I will be feeding it. My pipeline would be to binarize the images using sbb_binarize, then manually edit them to be high-quality ground-truth, then feed a large amount of these images back into the model.

  1. Would the end-result be better binarization on my dataset?
  2. How would this be accomplished?

A link to point me in the right direction would be a great help.

vahidrezanezhad commented 11 months ago

Dear @martholomew,

Of course, Pseudo-labeling can be effective, and we have also utilized this technique to enhance our models. You can employ https://github.com/qurator-spk/sbb_pixelwise_segmentation for your training needs. Initially, you can use our models to binarize your dataset and subsequently choose the documents with satisfactory results for custom dataset training. Sometimes, the predictions may exhibit local excellence. In such cases, you can employ cropping to prepare your ground truth (GT).