kundtx / lfd2022-comments

0 stars 0 forks source link

Learning from Data (Fall 2022) #5

Open kundtx opened 1 year ago

kundtx commented 1 year ago

http://8.129.175.102/lfd2022fall-poster-session/10.html

anullple commented 1 year ago

G28 Yifei Zhang: Nice work! Consise but very inspiring. I have a question about the training data. How do you acquire the input data-groundtruth pairs? Are they based on synthetic data? Then how do you transfer the model to real data like the example you present in supplentary material. Also, I am curious about how do you deal with different illumination levels and noise levels. I think it is hard to make it robust to different parameter settings with a single end2end DNN. Thanks!

ChernweiRen commented 1 year ago

G10 Chengwei Ren: Hi Yifei, thanks for your questions!

The dataset we used here is called Seeing-In-Dark(SID) and its original purpose is to enhance low light images. The training pairs of SID composed of a short exposure input image and a long exposure out image, representing low light and normal light imaging respectively. The exposure time of input images are also different, corresponding to the images under different level low light conditions. With the above priori information for you, let's answer your questions one by one.

1) About training pairs: The input and output of SID data sets are both RAW images, so the original output is not edge information. So to achieve edge detection, we first do Image Signal Processing(ISP) on the output long exposure RAW image and then adopt Canny edge extraction to extract the edge of the ISPed image(normal light). Finally, feed that edge and low exposure images as training pairs to the network for training. So as you can see, we have simplified the processes of extremely low light enhancement, ISP and normal light edge extraction into a single end-to-end network under ultra low light conditions.

2) About model robustness: In fact, no additional modifications or improvements have been made to the model we used to infer the real data, and also no special treatment has been made for different light levels and signal-to-noise ratio situations. This is NOT the purpose of our work. Our tricks to improve the robustness of U-NeXt is to cover various distribution of training and validation data sets as much as possible. For example, do some preprocessing and data augmentations as much as possible, divide the different short exposure images into multiple patches to enrich the amount of different light distributions. Big data Is all your need. :)

Hope the above information can help you understand our work better!> @anullple

G28 Yifei Zhang: Nice work! Consise but very inspiring. I have a question about the training data. How do you acquire the input data-groundtruth pairs? Are they based on synthetic data? Then how do you transfer the model to real data like the example you present in supplentary material. Also, I am curious about how do you deal with different illumination levels and noise levels. I think it is hard to make it robust to different parameter settings with a single end2end DNN. Thanks!

anullple commented 1 year ago

G28 Yifei Zhang: Hello Chengwei, thanks for the reply. I learned a lot from it. Combine short and long exposures for input and ground truth. That idea is beautiful.

Again I'd like to express how I love the idea of your project. Though I am not an expert in low-light image processing, the thought of converting RAW directly to edges moved me. On the one hand, gradient information has long been exploited as guidance for image restoration problems like superresolution and deblurring. On the other hand, the edges already have some high-level information themselves, which can be helpful for surveillance and autonomous driving. It is a beautiful idea to regress the edges directly from raw images.

Have you ever considered integrating your edge detection network into other neural ISPs for two tasks jointly to produce visually satisfying and geometrically accurate images? Just like Dirty Pixels did (Steven Diamond, SIGGRAPH21), I suggest using a dual branch structure, like this one: Dual Super-Resolution Learning for Semantic Segmentation, CVPR20.

@ChernweiRen G10 Chengwei Ren: Hi Yifei, thanks for your questions!

The dataset we used here is called Seeing-In-Dark(SID) and its original purpose is to enhance low light images. The training pairs of SID composed of a short exposure input image and a long exposure out image, representing low light and normal light imaging respectively. The exposure time of input images are also different, corresponding to the images under different level low light conditions. With the above priori information for you, let's answer your questions one by one.

1) About training pairs: The input and output of SID data sets are both RAW images, so the original output is not edge information. So to achieve edge detection, we first do Image Signal Processing(ISP) on the output long exposure RAW image and then adopt Canny edge extraction to extract the edge of the ISPed image(normal light). Finally, feed that edge and low exposure images as training pairs to the network for training. So as you can see, we have simplified the processes of extremely low light enhancement, ISP and normal light edge extraction into a single end-to-end network under ultra low light conditions.

2) About model robustness: In fact, no additional modifications or improvements have been made to the model we used to infer the real data, and also no special treatment has been made for different light levels and signal-to-noise ratio situations. This is NOT the purpose of our work. Our tricks to improve the robustness of U-NeXt is to cover various distribution of training and validation data sets as much as possible. For example, do some preprocessing and data augmentations as much as possible, divide the different short exposure images into multiple patches to enrich the amount of different light distributions. Big data Is all your need. :)

Hope the above information can help you understand our work better!> @anullple

qingyuhai commented 1 year ago

G13: Qian Zhang: Hi, Chengwei, your work is very interesting and enlightening. I have a question for you. In the results of the experiment in Figure 3, what is the approximate light intensity that you tested? Did you do the tests at different dark levels?

ChernweiRen commented 1 year ago

Thank you for your acknowledgement of our work!

Yes it's a brilliant idea to merge the edge net and ISP net into a single one. It reminds me that there are some works utilized the extracted edges to compensate for the missing edges of low light images to improve the effect of enhancement. Maybe we will work on this idea in early future!> @anullple

G28 Yifei Zhang: Hello Chengwei, thanks for the reply. I learned a lot from it. Combine short and long exposures for input and ground truth. That idea is beautiful.

Again I'd like to express how I love the idea of your project. Though I am not an expert in low-light image processing, the thought of converting RAW directly to edges moved me. On the one hand, gradient information has long been exploited as guidance for image restoration problems like superresolution and deblurring. On the other hand, the edges already have some high-level information themselves, which can be helpful for surveillance and autonomous driving. It is a beautiful idea to regress the edges directly from raw images.

Have you ever considered integrating your edge detection network into other neural ISPs for two tasks jointly to produce visually satisfying and geometrically accurate images? Just like Dirty Pixels did (Steven Diamond, SIGGRAPH21), I suggest using a dual branch structure, like this one: Dual Super-Resolution Learning for Semantic Segmentation, CVPR20.

@ChernweiRen G10 Chengwei Ren: Hi Yifei, thanks for your questions!

The dataset we used here is called Seeing-In-Dark(SID) and its original purpose is to enhance low light images. The training pairs of SID composed of a short exposure input image and a long exposure out image, representing low light and normal light imaging respectively. The exposure time of input images are also different, corresponding to the images under different level low light conditions. With the above priori information for you, let's answer your questions one by one.

1) About training pairs: The input and output of SID data sets are both RAW images, so the original output is not edge information. So to achieve edge detection, we first do Image Signal Processing(ISP) on the output long exposure RAW image and then adopt Canny edge extraction to extract the edge of the ISPed image(normal light). Finally, feed that edge and low exposure images as training pairs to the network for training. So as you can see, we have simplified the processes of extremely low light enhancement, ISP and normal light edge extraction into a single end-to-end network under ultra low light conditions.

2) About model robustness: In fact, no additional modifications or improvements have been made to the model we used to infer the real data, and also no special treatment has been made for different light levels and signal-to-noise ratio situations. This is NOT the purpose of our work. Our tricks to improve the robustness of U-NeXt is to cover various distribution of training and validation data sets as much as possible. For example, do some preprocessing and data augmentations as much as possible, divide the different short exposure images into multiple patches to enrich the amount of different light distributions. Big data Is all your need. :)

Hope the above information can help you understand our work better!> @anullple

ChernweiRen commented 1 year ago

Hi Qian! Thanks for your questions!

It's hard to approximate the light intensity of images from See-In-the-Dark(SID) datasets, because it seems that this imaging parameter is not included. Maybe some algorithms can be applied to estimate it? However, the SID adopt the exposure time to represent the light intensity, that is, the number of photons entering the aperture during imaging. For the image in Figure 3, the exposure time is 0.04s, and the corresponding long exposure time of its ground truth is 10s. Hope these data will help you!

And we do tests our net at different dark levels. The lowest environmental illuminance we have currently explored is about 0.35LUX. Each pixel in each frame receives about 80 photons. The demo video given at the lower right corner of our poster shows the edge extraction results in that condition. Our network can work properly for imaging with illuminance greater than this. The lower environmental illuminance(< 0.35LUX) remains to be further explored.> @qingyuhai

G13: Qian Zhang: Hi, Chengwei, your work is very interesting and enlightening. I have a question for you. In the results of the experiment in Figure 3, what is the approximate light intensity that you tested? Did you do the tests at different dark levels?

Caarinaaa commented 1 year ago

G10 Jinnan He:

                                 Hi guys! ✨

                      We are G10 Dancing In the Dark.

                       Welcome to our poster page. 😘

            We sincerely appreciate your comments and discussion!

Don't forget to check the demo video for you: Dancing In the Dark