pwangcs / DeepOpticsSCI

[ICCV 2023] Source code and pre-trained models for "Deep Optics for Video Snapshot Compressive Imaging".
MIT License
8 stars 0 forks source link

About the Full-dynamic-range capacity of the structural mask #1

Closed dawnlh closed 1 year ago

dawnlh commented 1 year ago

Hi~ @pwangcs

Thanks for sharing your wonderful work! I have some question about the dynamic range issue in video CS.

As mentioned in the paper (Sec3.2--Full dynamic range (FDR)), the decreased dynamic range in video CS is origined from the fact that multiple frames are integrated into a single measurement. So the equivalant dynamic range for each of the reconstructed frame would be several times lower than the sensor's physical dynamic range. I agree on this point.

However, I think, even using the proposed 'structural mask', this problem still exists —— each measure is sill the composite of multiple original frames, and thus the equivalant dynamic range for the reconstructed frames will still decreased. The structrual mask's feature——the summation of the temporal weight equals to 1 —— can only guarantee that different pixels have the same exposure time thus making the measurement looks more smooth and will not generate local satureatation. But I cannot figure out why it can increase the dynamic range.

The paper mentioned that It means that the brightness range of measurement is equal to that of each of video frames regardless of B. Therefore, the proposed structural mask keep the dynamic range of video SCI in line with that of the used sensors, i.e., full dynamic range (FDR).. Emmm, this conclusion seems not so logical, as what matters is the relation between the measure's physical dynamic range and the reconstructed video's equivalent dynamic range. Actually, the dynamic range is physically determined by the full-well capacity of the pixel circuits. As a whole, video CS decouples multiple frames from a single real-captured measurement. It seems that the decreased dynamic range for the reconstructed videos is a physical defect, which can hardly be overcome by mask design.

Could you give some explanation on this issue? Thanks!

pwangcs commented 1 year ago

Hi, dawnlh, pixel's physical bit depth directly determines the dynamic range of captured measurement rather than reconstructed video. Using coded aperture/exposure (namely mask) to improve dynamic range is an useful approach in the field of computational imaging.

Considering a special state, namely video SCI for a static scene. 10 same video frames need to be compressed into a measurement image. Without any optical modulation, compressed measurement's pixel range is 10× larger than each video frame. With binay mask modulation, compressed measurement's pixel range is ~5× larger than each video frame. With structural mask modulation, compressed measurement's pixel range is the same as each video frame. Compressed measurement's pixel range is determined by the used sensor, usually 0-255 for 8-bit sensor. Using different modulation strategies would affect the reconstructed dynamic range.

I hope the above statement answers your question well. You can also refer to our OL paper [https://doi.org/10.1364/OL.499735].

dawnlh commented 1 year ago

Thanks for your prompt answer!

10 same video frames need to be compressed into a measurement image. Without any optical modulation, compressed measurement's pixel range is 10× larger than each video frame.

In this case, you assume that the video frame keeps their original intensity when integration, which causes 10x larger value and might cause saturation/clip and information loss (thus sacrify the dynamic range). But in physical capture, we will generally avoid saturation by decreasing the light throughput. So the intensity for each integrated video frame will be decreased as well, and the corresponding mask sequence for each pixel should be [1/10, 1/10, ..., 1/10], instead of [1, 1, 1,..., 1], which also satisfy the porposed sutructural mask's sum-to-one condition. (Perhaps, the scale of the mask's summation value doesn't matter so much? As the physical capture process will adpat/normalize them according to the light brightness)

So, emmm, from the perspective of the overall information entropy, if the physical capture porcess doesn't introduce saturation, i.e, no information loss, and the sensor's physical full-well-capacity determines the upbound of information captured., why changing the mask can increase the dynamic range. (I mean, if the reconstructed video frames have the same dynamic range as the sensor, there should be some 'more' details of information). I hope I make my confusion clear, and thanks for your explanation!

pwangcs commented 1 year ago

Sensor's physical bit depth determines captured pixel range instead of captured inforamtion capacity. Following the above setting, compressed measurement is limited in 0-255, so video frames need to be 0-255/10 to avoid overexpose, leading to low dynamic range. In this case, using structural mask modulation, video frames keep in 0-255, namely full dynamic range.

One key objective of computational photography is to capture visual informartion better under hardware's physical limits. Please refer to some papers related to high-dynamic-range imaging via coded aperture/exposure.

dawnlh commented 1 year ago

Ok, thanks for your patient explanation.