ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.
this table is a WIP, but shows how we can compare ShabbyPages against other binarization and de-noising datasets. Mose of these prior work datasets are either way to small (the DIBCOs) or too naive (NO), so this table will highlight the strengths of SP