How should the photos be processed for training?

trougnouf / nind-denoise

Image denoising using the Natural Image Noise Dataset

GNU General Public License v3.0

28 stars 6 forks source link

How should the photos be processed for training? #5

Open hqhoang opened 1 year ago

hqhoang commented 1 year ago

The current dataset are photos fixed at certain exposure/saturation/contrast. Different photographers will have different "flavor/taste" and will process the RAW differently, low vs high contrast or saturation, even the white balance will also be different, some will prefer warm tone, some might prefer leaving the original color cast (e.g. LED lighting, ...)

What would be a good strategy in terms of training data? I can do exposure bracketing in RAW, but then how should I process the RAW into sample data? Will a naive approach of exporting the RAW at different WB, contrast, saturation, ... work? Is it even possible to train on the unprocessed RAW (perhaps 12-bit demosaiced)?

The step of taking bracketed shots is probably the same regardless of how the RAW would be processed and trained, so maybe I can just start shooting samples. Have you found a place to host sample? For sharing, we can store only the original RAW files, 25-30MB each, about 30MB x 5EV = 150MB for each set, 15GB for 100 samples, not too bad while still maintaining the flexibility should we want to change the training approach. Those who run the training can do the processing/exporting locally.

trougnouf commented 1 year ago

You are right; I tried to do minimal processing but there is always a subjective aspect to it.

I am actually in the process of training with raw and near-raw (debayered lin. rec2020 for better generalization) images to offer more flexibility (and ideally the ability to integrate with processing software). More samples would be very much appreciated. I do ISO bracketing but exposure bracketing should offer the same results (or at least be equally useful) since modern cameras are ISO-invariant.

I have been uploading raw samples on the UCLouvain (university)'s dataverse, under https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/DEQCIM , but the dataset (and associated source code) won't be publicly available there until I publish it with an associated paper (within a year). I would gladly add your samples there in the meantime if you have a way to share them (eg a google drive or I can open up a temporary FTP server with 50 GB of storage).

In any case I would be happy to copy the raw dataset (curently 67 GB of Bayer images and 21 GB of X-Trans images) on a public host if you or anyone can offer such space before publication.

hqhoang commented 1 year ago

You're probably the only one who is actively working on this, so no need to share the dataset for now. I can host my RAW files on Google Drive and share to you.

For each sample, how many RAWs do you need? and how many EV for each step (1/3EV, 2/3EV, 1EV, ...)? I'll shoot mainly compressed RAF with my X-T2. Any particular scene/subject that you want? If not, I can shoot a wide variety and you pick what you want to use.

I'll use a tripod so alignment shouldn't be a problem. If need to, I can extract the ECC aligner code from my project into a standalone aligning tool. It's definitely more modern and flexible than align_image_stack (class OpenCV_Aligner of https://github.com/hqhoang/mftker/blob/dev/mftker.py)

trougnouf commented 1 year ago

Yes that would work perfectly :)

I usually aim for 5-10 noisy samples (with random ISO values) and 2 ground truths. (I take more and often end up discarding some because of slight variations in content and light and because my current camera is very sensitive to flickering lights.) As much variety as possible is best so long as nothing disturbs the scene.

I would be interested in your aligment tool. I made an alignment function which works well but only handles x/y shifts not differences in zoom, rotation, and such.

Do you use an app or script to control the camera? I have been using a gphoto2 script to avoid touching the camera, I don't know how portable it is but here it is in case it could be useful: https://gist.github.com/trougnouf/babdfa7e166bb8812adb76bc73768183 (I have some issues with it whereas the settings are not always applied hence the added delays but I normally end up with enough good images)

hqhoang commented 1 year ago

I used to use gphoto2, but it's not very portable, requiring my laptop. I usually use either the timer, or a release cable (basically a DIY 2.5mm stereo cable for the mic jack, shorting the ground to the middle ring will focus, then shorting ground to both middle and tip rings will take the photos).

I noticed that lens correction and rotation would affect the noise pattern, thus, causing artifacts when applying nind-denoise (duh, obviously!). That currently affects my darktable workflow, forcing me to rearrange my flow into extra steps when dealing with very noisy shots. I guess it doesn't make sense to train based on distorted/transformed noise patterns, so intermediate RAW/HDR in darktable workflow (right behind demosaic) is unavoidable. But nind-denoise doesn't have to happen during darkroom workflow as that would mess up with real-time user interactivity, has to happen during export. Have you discussed with darktable devs on this possibility?

Can you set the training to ignore clipped pixels? If I have to avoid clipping, that will limit the samples quite a lot, especially outdoor scenes.

hqhoang commented 1 year ago

With the reasoning that nind-denoise should be applied only during darktable export, I thought I could use darktable-cli to do the export while manipulating the sidecar XMP file to inject nind-denoise. Of course it's not possible at the moment as there's no module for nind-denoise, but I managed to split the XMP into two stages: pre-denoise (demosaic, exposure, color calibration, ...) and post-denoise (crop, rotate, lens correction, filmic, ...). It works, like this:

copy the XMP file into _s1.xmp, disabling all post-denoise ops, then export from RAW into a 16-bit TIFF
apply nind-denoise to the 16-bit TIFF
copy the XMP file into _s2.xmp, with only the post-denoise ops remain, then export from the denoised TIFF into the final JPG
optionally apply Richardson-Lucy deconvolution from GMIC

I tested the script on a set of around 250 RAFs, works pretty well, except "filmic rgb" is a little picky as it behaves slightly differently when loading from RAW vs from TIFF. I could also add a new Lua script to call this Python script directly from darktable, but there's still the normal darktable export process which is redundant. At least this works without modifying darktable.

I'll test a little more before creating a PR: https://github.com/trougnouf/nind-denoise/compare/master...hqhoang:nind-denoise:darktable-cli

trougnouf commented 1 year ago

I used to use gphoto2, but it's not very portable, requiring my laptop. I usually use either the timer, or a release cable (basically a DIY 2.5mm stereo cable for the mic jack, shorting the ground to the middle ring will focus, then shorting ground to both middle and tip rings will take the photos).

Indeed. I used the Steam Deck as a controller but any solution that produces steady results is a good one.

Can you set the training to ignore clipped pixels? If I have to avoid clipping, that will limit the samples quite a lot, especially outdoor scenes.

Yes :) I still avoid clipped pixels but I ignore them when they are part of the ground-truth so they shouldn't be an issue. I would avoid the sky still as it never seems to be static.

I like your approach :)

So far when working with raw/linear images I have been exporting the debayered image to OpenEXR (as that's the input/output of the denoising network during training and I found it relatively simple to get the same floating point image as in darktable whereas I couldn't export to a comparable floating point tiff using OpenCV). It has the advantages of showing me exactly what I'm working with but it takes up a lot more storage space for the intermediary image. Your approach saves a lot of storage space and pre-processing time at the cost of not showing the image we are working with. I guess they are both perfectly valid use-cases.

I haven't discussed AI denoising enough with the darktable developers, just a bit of IRC talk (and made a pixls post a long time ago), do you think I should make a github issue or forum discussion on pixls.us?

For pre-processing the images I think it's pretty much editor-agnostic. If it's acceptable to spend up to a few seconds opening an image, then darktable could ensure that the image is stored in a buffer after it has been debayered. Otherwise it's an export-only filter where we have to accept slight changes in the final output. As per above I think all approaches are worth having.

Thank you for your script! It looks very thorough and it's great to have this darktable integration :) Did you have any issue with the color calibration module as well as filmic rgb?

hqhoang commented 1 year ago

For darktable discussion, I think github is more appropriate as this involves technical integration more than image processing.

My script is just a hacky temporary workaround for current darktable, it could use some help from darktable devs. I agree that it's a valid workflow, some people like me prefer shifting the delay to the export stage, especially when my events are usually 600-800 photos each. I still see minor differences when applying filmic to the RAW vs to the TIFF, and found that the rotate/perspective module has a bug that affects the flow, too:

https://github.com/darktable-org/darktable/issues/13135

I tested exporting from darktable to OpenEXR, but nind-denoise is having problem opening the .exr file. I added "OPENCV_IO_ENABLE_OPENEXR=1" to the environment and got pass the warning, but ran into this error:

$ OPENCV_IO_ENABLE_OPENEXR=1 python3 /home/noname/tinker/git/nind-denoise/src/nind_denoise/denoise_image.py --network UtNet --model_path "/home/noname/tinker/git/nind-denoise/models/2021-06-14T20_27_nn_train.py_--config_configs-train_conf_utnet_std.yaml_--config2_configs-train_with_clean_data.yaml_--g_model_path_..-..-models-nind_denoise-2021-06-12T11_48_nn_train.py_--config_configs-train_conf_utnet_std.yaml_--config2_configs-train_w/generator_650.pt" --input DSCF5894.exr --output DSCF5894_denoised.exr 
cs and/or ucs not set, using defaults ...
cs=504, ucs=480
Traceback (most recent call last):
  File "/home/noname/tinker/git/nind-denoise/src/nind_denoise/denoise_image.py", line 229, in <module>
    ds = OneImageDS(args.input, args.cs, args.ucs, args.overlap, whole_image=args.whole_image, pad=args.pad)
  File "/home/noname/tinker/git/nind-denoise/src/nind_denoise/denoise_image.py", line 84, in __init__
    self.inimg = np_imgops.img_path_to_np_flt(inimg_fpath)
  File "/home/noname/tinker/git/nind-denoise/src/common/libs/np_imgops.py", line 28, in img_path_to_np_flt
    raise TypeError("img_path_to_np_flt: Error: fpath={fpath} has unknown format ({rgb_img.dtype})")
TypeError: img_path_to_np_flt: Error: fpath={fpath} has unknown format ({rgb_img.dtype})

hqhoang commented 1 year ago

I traced that error to rgb_img.dtype being "float32", but after playing around in the code and 3 hours on stackoverflow, I gave up on using OpenEXR format. Instead, modified the code to read/write 32-bit TIFF from and to darktable, works pretty well and solved the problem I had with filmic and sigmoid. Also added the rest of the modules into the 1st- vs 2nd-stage lists.

Because the existing code doesn't have 32-bit output option, I made up a convention that .tif is 16-bit while .tiff is 32-bit, should work as an ad-hoc for now, we should properly add a --bit-depth option. Will keep testing further but it seems to work fine.

https://github.com/trougnouf/nind-denoise/commit/ac90b89b5301a9e87e4ac8427e878dc66c6ec4cd

trougnouf commented 1 year ago

Sorry about the huge delay. I'm copying the code I use to load OpenEXR files (with OPENCV_IO_ENABLE_OPENEXR=1), but I suspect it's similar to what you are using so idk what the issue is. Anyway 32-bit TIFF is just as good if you have it behaving properly :)

def fpath_to_tensor(img_fpath, device=torch.device(type="cpu"), batch=False):
    # totensor = torchvision.transforms.ToTensor()
    # pilimg = Image.open(imgpath).convert('RGB')
    # return totensor(pilimg)  # replaced w/ opencv to handle >8bits
    try:
        tensor = torch.tensor(np_imgops.img_fpath_to_np_flt(img_fpath), device=device)
    except ValueError as e:
        logging.error(f"fpath_to_tensor error {e} with {img_fpath=}. Trying again.")
        try:
            tensor = torch.tensor(
                np_imgops.img_fpath_to_np_flt(img_fpath), device=device
            )
        except ValueError as e:
            logging.error(
                f"fpath_to_tensor failed again ({e}). Trying one last time after 5 seconds."
            )
            time.sleep(5)
            tensor = torch.tensor(
                np_imgops.img_fpath_to_np_flt(img_fpath), device=device
            )
    if batch:
        tensor = tensor.unsqueeze(0)
    return tensor

def img_fpath_to_np_flt(
    fpath: str  # , bit_depth: Optional[int] = None
) -> np.ndarray:
    """returns a numpy float32 array from RGB image path (8-16 bits per component)
    shape: c, h, w
    FROM common.libimgops"""
    if not os.path.isfile(fpath):
        raise ValueError(f"File not found {fpath}")
    if fpath.endswith(".npy"):
        return np.load(fpath)
    try:
        rgb_img = cv2.cvtColor(
            cv2.imread(fpath, flags=cv2.IMREAD_COLOR + cv2.IMREAD_ANYDEPTH),
            cv2.COLOR_BGR2RGB,
        ).transpose(2, 0, 1)
    except cv2.error as e:
        raise ValueError(f"img_fpath_to_np_flt: error {e} with {fpath}")
    if rgb_img.dtype == np.float32 or rgb_img.dtype == np.float16:
        res = rgb_img
    elif rgb_img.dtype == np.ubyte:
        res = rgb_img.astype(np.single) / 255
    elif rgb_img.dtype == np.ushort:
        res = rgb_img.astype(np.single) / 65535
    else:
        raise TypeError(
            f"img_fpath_to_np_flt: Error: fpath={fpath} has unknown format ({rgb_img.dtype})"
        )
    return res

hqhoang commented 1 year ago

I'm not sure what I've done, but nind-denoise can read OpenEXR now, it just can't write out to OpenEXR.

106/108
107/108
Traceback (most recent call last):
  File "/home/noname/tinker/git/nind-denoise/src/nind_denoise/denoise_image.py", line 268, in <module>
    pt_helpers.tensor_to_imgfile(newimg, args.output)
  File "/home/noname/tinker/git/nind-denoise/src/common/libs/pt_helpers.py", line 33, in tensor_to_imgfile
    raise NotImplementedError(f'Extension in {path}')

Anyway, it doesn't matter at the moment as upon importing back into darktable, TIFF-32 get closed to the original RAF while OpenEXR-32 looked washed out, even with minimum processing (just demosaic) and doesn't involve nind-denoise. I'll stick with TIFF-32 for now when dealing with darktable.

tiff_vs_openexr_export_import

My next question is: do you train on normalized or properly exposed images, or do you train on the demosaiced raw data but leaving them as under-ISOed?

For example: I bracketed at a fixed ISO, but the sensors are ISOless. Can you train with these as-is, or do you bump up brightness before using them for training. I had a few sets shared to dpreview:

https://www.dpreview.com/forums/thread/4704150

Direct links to the sets: https://drive.google.com/drive/folders/1-KxU5_UqdBZHebqEkDXaDylyChTstaFs https://drive.google.com/drive/folders/1sIj6eKEBFyYeOq_J5Lf9JNKPVKHYACYT https://drive.google.com/drive/folders/1x9cEiwaxRTRLo-fnzs8BmODT1i-aQHaW

Should I keep shooting like the above, or should I bracket with the ISO bumped up accordingly?

trougnouf commented 1 year ago

Thank you for the samples!

The way you do "ISO"-bracketing is different from mine but I think it will work just as well. When training I feed the network with the image as-is but I match the exposure when computing the loss, so it should not matter (it will see different data but it should be trained to handle those low-exposure use-cases as well, which is very useful in high dynamic range situations).

I haven't used OpenCV to save OpenEXR files, I think the options are too limited there. I'm pasting the code I use to save images as OpenEXR below. I just do debayering and convert from CameraRGB to Lin. Rec. 2020 then make sure to select Linear Rec. 2020 input profile in darktable. (Note that it doesn't matter if you get the same results in TIFF).

def rgb_img_to_file(img: np.ndarray, fpath: str, color_profile: str, bit_depth: Optional[int] = None) -> None:
    """Save (c,h,w) RGB image to file."""
    if fpath.endswith("exr"):
        # Init OpenEXR header
        header = OpenEXR.Header(img.shape[-1], img.shape[-2])
        header["Compression"] = Imath.Compression(Imath.Compression.ZIPS_COMPRESSION)
        # Chromaticities
        assert color_profile is None or color_profile.startswith(
            "lin"
        ), f"{color_profile=}"
        if color_profile == "lin_rec2020":
            header["chromaticities"] = Imath.Chromaticities(
                Imath.chromaticity(0.708, 0.292),
                Imath.chromaticity(0.17, 0.797),
                Imath.chromaticity(0.131, 0.046),
                Imath.chromaticity(0.3127, 0.3290),
            )
        elif color_profile == "lin_sRGB":
            header["chromaticities"] = Imath.Chromaticities(
                Imath.chromaticity(0.64, 0.33),
                Imath.chromaticity(0.30, 0.60),
                Imath.chromaticity(0.15, 0.06),
                Imath.chromaticity(0.3127, 0.3290),
            )
        elif color_profile is None:
            pass
        else:
            raise NotImplementedError(f"rgb_img_to_file: OpenEXR with {color_profile=}")
        # Bit depth
        if not bit_depth:
            if img.dtype == np.float16:
                bit_depth = 16
            elif img.dtype == np.float32:
                bit_depth = 32
            else:
                raise NotImplementedError(f"rgb_img_to_file: {img.dtype=} with OpenEXR")
        if bit_depth == 16:
            header["channels"] = {
                "R": Imath.Channel(Imath.PixelType(Imath.PixelType.HALF)),
                "G": Imath.Channel(Imath.PixelType(Imath.PixelType.HALF)),
                "B": Imath.Channel(Imath.PixelType(Imath.PixelType.HALF)),
            }
            np_data_type = np.float16
        elif bit_depth == 32:
            # converting to np.float32 even though it's already the dtype, otherwise
            # *** TypeError: Unsupported buffer structure for channel 'B'
            # with negative values.
            header["channels"] = {
                "R": Imath.Channel(Imath.PixelType(Imath.PixelType.FLOAT)),
                "G": Imath.Channel(Imath.PixelType(Imath.PixelType.FLOAT)),
                "B": Imath.Channel(Imath.PixelType(Imath.PixelType.FLOAT)),
            }
            np_data_type = np.float32
        else:
            raise NotImplementedError(f"rgb_img_to_file: OpenEXR with {bit_depth=}")        
        # Save
        # TODO include EXIF metadata
        exr = OpenEXR.OutputFile(fpath, header)

        exr.writePixels(
            {
                "R": img[0].astype(np_data_type),
                "G": img[1].astype(np_data_type),
                "B": img[2].astype(np_data_type),
            }
        )
    else:
        if img.dtype == np.float32 and (img.min() <= 0 or img.max() >= 1):
            print(
                f"rgb_img_to_file warning: DATA LOSS: image range out of bounds "
                f"({img.min()=}, {img.max()=}). Consider saving {fpath=} to "
                "OpenEXR in order to maintain data integrity."
            )
        if color_profile != "gamma_sRGB":
            print(
                f"rgb_img_to_file warning: {color_profile=} not saved to "
                f"{fpath=}. Viewer will wrongly assume sRGB."
            )
        hwc_img = img.transpose(1, 2, 0)
        hwc_img = cv2.cvtColor(hwc_img, cv2.COLOR_RGB2BGR)
        hwc_img = (hwc_img * 65535).clip(0, 65535).astype(np.uint16)
        cv2.imwrite(fpath, hwc_img)

hqhoang commented 1 year ago

Thank you for the samples!

The way you do "ISO"-bracketing is different from mine but I think it will work just as well. When training I feed the network with the image as-is but I match the exposure when computing the loss, so it should not matter (it will see different data but it should be trained to handle those low-exposure use-cases as well, which is very useful in high dynamic range situations).

I haven't used OpenCV to save OpenEXR files, I think the options are too limited there. I'm pasting the code I use to save images as OpenEXR below. I just do debayering and convert from CameraRGB to Lin. Rec. 2020 then make sure to select Linear Rec. 2020 input profile in darktable. (Note that it doesn't matter if you get the same results in TIFF).

I'll stick with TIFF-32 when dealing with darktable for now, as the .exr file somehow doesn't even get the EXIF copied over by darktable. I guess OpenEXR is not being used a lot in darktable to expose those bugs. Once things are smoothed out, we can switch between TIFF and OpenEXR anytime. The training process is independent from usage process so no worries.

I'll try to take more sample shots, perhaps mixing in different materials and scenes (outdoor at night, human skin/subject, animals, ...). I'll try to shoot at wide aperture on different lenses for more varieties of bokeh. I just don't know how or if we need samples of motion blur. Let me know if you want samples in specific scenarios.

Demosaic algorithm seems to be important? I think the existing models were trained with Markesteijn on X-Trans, I get some artifacts when trying other algorithms (FDC, or 3pass+VNG), I guess the noise pattern does change depending on the demosaic process. Any plan on addressing the dependency on demosaic, or should we just stick to Markesteijn (for X-Trans) to simplify the problem for now?

hqhoang commented 1 year ago

I'm testing out this bracketing method on my X-T2, just want your feedback on whether if I should change anything.

The X-T2 has a few AE bracketing options: ±3, ±5, ±7, ±9. I'm using ±7, which would take 7 continuous shots, 1EV each step, while setting the SS dial to T and adjusting it down to -3EV. That way, the first shot is -6EV, middle shot is -3EV, and the last shot is 0EV. However, the camera tends to underexpose a bit, so I'm thinking of switching to ±9, with the shots going from -7EV to +1EV, in case 0EV is not exposed enough. Storage shouldn't be a problem for me (on GDrive).

For outdoor shots, I'm planning on taking shots when there's plenty of light and least amount of wind. With plenty of light, the SS will be super fast, the whole 9-shot time frame will be as short as can be, reducing the chance of slight movement in between. For variety, I will also shoot some outdoor night scenes (it's getting warmer now) when it's not windy.

EDIT: even with SS of 1/32000 to 1/8000 of continuous bracketed shots within a second, there's still slight movement of leaves and branches in between frames :-(

Is there a way/algorithm to detect local differences beside the noises? If there's a reliable way to detect them, we perhaps can auto-skip those local crops while still can utilize the set? Hugin uses autopano-sift-c with feature-detection to generate control points, that might be useful to detect local outliers?

Another feature to consider is having a mask for the set (or base/ground shot). Since we cannot modify the RAWs themselves, it's probably easier to inspect visually and mask out sections with movement/motion manually (traffics in background, leaves in blowing wind, ...). Masking is just a one-time task for each set, and is likely more reliable than auto-detection mentioned above.

With dpreview shutting down, I've saved some RAWs of various camera brands from their studio test scene. Not sure if it's ok to include them for training, but saving for possible future inclusion.