Improve Performance on Letterpress and Other Augmentations Relying on Noise Generation

cs-mshah commented 1 year ago

Augmentations that rely on perlin noise generation are particularly slow, including Letterpress and others.

It would be great if the augmentations taking more time can be made more efficient/leverage GPU as it is too slow to practically use the bottom ones in the list for training.

I tried to train a model using letterpress and found that its one epoch was taking 12x more time than without applying the augmentation. I timed most augmentations on augmenting 7 images and here are the results:

Screenshot from 2022-12-04 12-45-04

Here is the code for timing:

aug_list = [
        DirtyDrum(line_concentration=0.5, noise_intensity=1.0, direction=2),
        BleedThrough(intensity_range=(0.6, 1.0), offsets=(7, 7), alpha=0.5),
        DirtyRollers(),
        Dithering(),
        Faxify(),
        InkBleed(severity=(0.5, 0.8)),
        Letterpress(),
        LowInkRandomLines(count_range=(10,15)),
        Markup(),
        PencilScribbles(size_range=(250, 400), count_range=(1, 10), stroke_count_range=(1, 6)),
        BrightnessTexturize(),
        ColorPaper(),
        Gamma(),
        Geometric(rotate_range=(-3,3)),
        LightingGradient(),
        PageBorder(width_range=(5,10)),
        SubtleNoise(subtle_range=25),
        BadPhotoCopy(),
        BindingsAndFasteners(ntimes=(2, 4)),
        Folding(fold_count=4),
        Jpeg(),
        NoiseTexturize()
        ]

    times = []

    for aug in aug_list:
        start_time = time.time()
        aug_imgs = []
        for img in imgs:
            aug_imgs.append(aug(img))
        end_time = time.time()
        times.append(end_time - start_time)

kwcckw commented 1 year ago

Thanks for the feedback. Right now the performance improvement is in our improvement roadmap and it should be included in the next major update.

jboarman commented 1 year ago

The key issue with these slower augmentations is the noise generation process. So, we will use this issue to focus on approaches to speed noise generation while retaining an essential level of random variation in the distortions.

These augmentations should all be improved once we can improve the noise generation process:

Letterpress
BleedThrough
BadPhotoCopy
LightingGradient
PageBorder
NoiseTexturize
DirtyDrum
InkBleed
Faxify

We've recently released a performance improvement via #270 which included use of Numba to optimize loops. However, we found there remain a lot of opportunity to improve the noise generation processes which most heavily impact augmentation performance.

See greater than 100% performance improvements from recent Augraphy updates: https://github.com/sparkfish/augraphy/issues/270#issuecomment-1502517272

sparkfish / augraphy

Improve Performance on Letterpress and Other Augmentations Relying on Noise Generation #214