[Feature request] Bayer preserving augmentation techniques

copaah commented 3 years ago

🚀 Feature

Learning directly from raw bayer images is an exciting idea that will take the entire learning step completely end-to-end. However, doing this presents it own challenges.

One of them is to preserve bayer patterns when doing common image augmentation techniques such as flipping images.

This feature request to add to Kornia a set of augmentation techniques that preserve bayer patterns and thus allow deep learning models that train the network directly on raw bayer images to continue to benefit from augmentation techniques.

Motivation

Research into learning from raw bayer images is exciting new field and adding support for augmentation in Kornia can be a good way to aid this research.

Pitch

In the kornia.augmentation module add two new classes:

def RandomVerticalFlipBayerPreserving(return_transform=None, same_on_batch=False, p=0.5, p_batch=1.0, keepdim=False)

def RandomHorizontalFlipBayerPreserving(return_transform=None, same_on_batch=False, p=0.5, p_batch=1.0, keepdim=False)

Alternatives

N/A

Additional context

See this GitHub repo that already implements it:

https://github.com/Jiaming-Liu/BayerUnifyAug

edgarriba commented 3 years ago

We recently added color conversions on that direction: https://kornia.readthedocs.io/en/latest/color.html#bayer-raw Verify that could work with the proposed ideas. /cc @oskarflordal @shijianjian

oskarflordal commented 3 years ago

What is suggested is doing augmenations directly on the bayer image, i.e. without conversions. Given the limitations of augmentations (cropping needed for flips/translation, rotation not possible without breaking the cfa) and the nature of what an augmentation is: What would be the strong case for not doing e.g. (bayer -> color) -> augment -> (color -> bayer), ideally with a higher quality debayer and a bit of noise at the end? My gut feel is that would give you better variations and given it is augmentations it doesn't affect the main use case of going directly from sensor to network. If you are looking to mimic e.g. fix panel noise or similar that is going to be broken by the augmentation anyway. With such a pipe in place it is also easier to reuse annotated RGB-data. I don't know enough about the augmentationpipeline itself to understand how invasive it would be, I guess you would want to allow only certain augmentations noise (ideally color aware), flipping/translate, random crops (on 2x2 borders), image degrading augs like stuck at 1/0 pixels and other things that can be done per pixel/area. @edgarriba

copaah commented 3 years ago

@oskarflordal

This is probable a stupid question, but how will you go from color to bayer without keeping track of the CFA throughout the augmentation pipeline?

oskarflordal commented 3 years ago

(note that I am not really part of the core Kornia team, I am just interested in understanding the use case) Not sure if you need to keep track of it? You can go from rgb representation to raw with any CFA, if you train your algorithm for a specific sensor I suppose you convert using that fixed CFA no matter what data source you have. Obviously, going from a raw image you would loose a lot of the sensor specific defects on the way through the augmentation pipeline. Thinking more about it I guess there are two main cases were you need raw augmentation

Targeting your network/algorithm to a system without ISP or a soft ISP were conversion is expensive. Here you mostly care about getting data formatted as raw not necessarily getting i.e. 10/12 bit per channel inputs with the correct types of noise and unbalanced channels you get from a raw image. Here it seems running augmentations on rgb data and converting as a last step of the augmentation pipeline is mostly fine. I have seen this case a couple of times (custom cameras, algorithms that need to be run to inform the rest of the ISP of how to treat the image etc) so it likely happens with some frequency. In this scenario you can use e.g. actual raw images or jpeg or whatever as input as long as you end up with a raw image in the end for training.
If you are testing the hypothesis that a low gain RAW input with e.g. 10/12+ bits per component adds some benefit then you would have to be very careful with your augmentations in general but flips might of course be valid. I suppose this is the case you are mostly interested in? Given the problems that arise from this (sourcing data from many sensors to avoid learning characteristics of a specific camera etc) I guess the question for a library like Kornia is how generic that need is given how hard it is to achieve low noise raw images in practical applications. But perhaps this is my industry outlook on the world :) this might be worthwhile research in a University setting.

shijianjian commented 2 years ago

Just looked at the paper, my question is would the bayer-preserving augmenations still be helpful if combined with those common non-bayer-preserving methods (e.g. rotation, scaling)?

From the implementation perspective, I think it is better to make it out of the current augmentation pipeline, as

class RandomVerticalFlipBayerPreserving

class RandomHorizontalFlipBayerPreserving

Then it can work like:

AugmentationSequential(
    RandomVerticalFlipBayerPreserving(),
    RandomHorizontalFlipBayerPreserving(),
    preprocessing=RgbToRaw(),
    postprocessing=RawToRGB(),
)

kornia / kornia