noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
540 stars 61 forks source link

Hi,I've got some new questions #132

Closed QingTianNNN closed 6 months ago

QingTianNNN commented 6 months ago

I konw the scale factors are used to slove the scale ambiguity. How to use them? I found that there is a function named : disp_to_depth(disp, min_depth, max_depth) [https://github.com/noahzn/Lite-Mono/blob/4874b35df8ed4da16159ce8be8c697028b72bf76/layers.py#L11], why use the disp? In the options.py, https://github.com/noahzn/Lite-Mono/blob/4874b35df8ed4da16159ce8be8c697028b72bf76/options.py#L9 what's meaning of setting 3 scales? And the total loss : image, how to use the s? Thanks to your reply

noahzn commented 6 months ago

Hi, they are two different things.

In the disp_to_depth function, the predicted depth values are output by a sigmoid layer, so their ranges are in [0, 1]. We use disp_to_depth to scale the predicted values to the range of [min_depth, max_depth], thus obtaining the 'scaled depth'.

However, in options.py the argument scales=[0,1,2] means we calculate the loss based on the generated depth maps with three different scales (1, 1/2, 1/4). This is consistent with Figure 2 in the paper, where you can see three 'prediction heads' generating depth maps at three resolutions.

QingTianNNN commented 6 months ago

Sorry, i still have no idea about the scale ambiguity and where this scaling factor is applied. Does this scaling factor enlarge or reduce the objects in the image? Or just scale the depth map's size?

noahzn commented 6 months ago

Scaling factor? Which scaling factor do you mean?

QingTianNNN commented 6 months ago

In my opinion,s means the scaling factor, if not , why scale the three ‘prediction heads’?I know this may sound sstupid.

noahzn commented 6 months ago

See the code here: https://github.com/noahzn/Lite-Mono/blob/main/networks/depth_decoder.py#L34

The multi-scale depth prediction helps to optimize the photometric error. It is also used in Monodepth2.

QingTianNNN commented 6 months ago

self.scales = range(4),but self.num_ch_dec = (self.num_ch_enc / 2).astype('int'), and self.num_ch_enc = num_ch_enc = np.array([48, 80, 128]), so for s in self.scales: self.convs[("dispconv", s)] = Conv3x3(self.num_ch_dec[s], self.num_output_channels) the number of self.num_ch_dec is 3, and s is 4, when i ran the code it did't got bug , i just want to konw what is the self.num_ch_dec[4].

QingTianNNN commented 6 months ago

oh ,i konw this ,i'm a fool, thank you !!!

noahzn commented 6 months ago

Please close the issue if your question has been resolved. Thank you.

noahzn commented 6 months ago

I'm now closing this issue as there is no response.