Open kishore-greddy opened 3 years ago
Hi @kishore-greddy 1-2. the uncertainty is usually unbounded but might lead your network to instability. As you noticed in your "EDIT", if you model log-uncertainty you should fix the problem.
Hope this helps ;)
Hey @mattpoggi ,
Thanks for the quick reply. I will try this out.
Hi @mattpoggi ,
Forgot to ask, Have you also tried the other method? Meaning, keeping the uncertainty values greater than 0 in the decoder and actually modelling for the uncertainty instead of log(uncertainty) where my loss function in 3) works. I read about Negative Log Likelihood minimization and a lot of people talk about taking the \log in the loss rather than modelling the log uncertainty itself.
Quoted from Lakshminarayanan et al, "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles", one of the papers referenced by you in your research. Here they talk about variance greater than 0. Could you please clarify? Thanks
I made some experiments by bounding the uncertainty in 0-1 with a sigmoid layer and adding the log term in the loss function, as you mentioned. The same strategy is used in D3VO paper (https://vision.in.tum.de/research/vslam/d3vo). The numbers where almost identical in the two formulations. I believe the important thing is just to avoid exploding gradients and unstable behaviors.
Hey @mattpoggi ,
I tried to model the log-uncertainty as you suggested, without binding the uncertainty to any range. I have exploding gradients problem. I have updated my loss function to be the one below,
After some iterations, in the first epoch itself, I face issues, please have a look at the image below,
Notice the loss just before I run into problems. Did you ever have to deal with something like this? Any hint is appreciated, Thanks.
EDIT: I managed to set a breakpoint just before the gradients exploded. Added a new image which shows the minimum value of the output uncertainties(in fact log uncertainties) for all images in the batch. As you can see , the minimum value coming at the output channel is -33.99, if we take exp(-33.99) it would come up to the order of 10^-15, and this being in the denominator is causing the loss value to blow up. I tried finding reasons why this is happening and I am not quite sure. Any guidance is highly appreciated. Thanks
That's quite weird, I actually never had a problem with gradients... Does this occur at any training? Does this occur even if you use the sigmoid trick? Anyway, before the gradients explode, the loss numbers are very similar to the ones I had seen during my experiments.
Hi @mattpoggi , I observed that this occurs almost at every training of log model. I have tried it 3 times now, and every time I have this problem. Sometimes the problem occurs at the 5th epoch, sometimes the 1st epoch itself, so that is not consistent. However, as I showed the loguncertainty values just before the gradients start to explode, the min value is -33, Network is predicting this value at some pixel. I am not sure why this problem is so random and even more surprised that you did not face any issues like this. My decoder is almost the same as yours and I have also posted the loss function. Do you find an issue there? Because that is the only thing that is different. I have not used the sigmoid trick yet, I wanted to train the model as you did.
You properly upsampled the uncertainty to the proper resolution scale, right? I can dig more into this after the CVPR rebuttal occurring this week... Just a few questions: 1) are you training on KITTI? 2) are you using M, S or MS?
Do you mean scaling of the uncertainty to full resolution before calculating the loss? Yes, I have done that.
If you mean upsampling of the uncertainties in the decoder, Yes, I have done that too,
class DepthDecoder(nn.Module):
def __init__(self, num_ch_enc, scales=range(4), num_output_channels=1, use_skips=True, use_uncert=False):
super(DepthDecoder, self).__init__()
self.num_output_channels = num_output_channels
self.use_skips = use_skips
self.upsample_mode = 'nearest'
self.scales = scales
self.use_uncert = use_uncert
self.num_ch_enc = num_ch_enc
self.num_ch_dec = np.array([16, 32, 64, 128, 256])
# decoder
self.convs = OrderedDict()
for i in range(4, -1, -1):
# upconv_0
num_ch_in = self.num_ch_enc[-1] if i == 4 else self.num_ch_dec[i + 1]
num_ch_out = self.num_ch_dec[i]
self.convs[("upconv", i, 0)] = ConvBlock(num_ch_in, num_ch_out)
# upconv_1
num_ch_in = self.num_ch_dec[i]
if self.use_skips and i > 0:
num_ch_in += self.num_ch_enc[i - 1]
num_ch_out = self.num_ch_dec[i]
self.convs[("upconv", i, 1)] = ConvBlock(num_ch_in, num_ch_out)
for s in self.scales:
self.convs[("dispconv", s)] = Conv3x3(self.num_ch_dec[s], self.num_output_channels)
if self.use_uncert:
self.convs[("uncertconv", s)] = Conv3x3(self.num_ch_dec[s], self.num_output_channels)
self.decoder = nn.ModuleList(list(self.convs.values()))
self.sigmoid = nn.Sigmoid()
def forward(self, input_features):
self.outputs = {}
# decoder
x = input_features[-1]
for i in range(4, -1, -1):
x = self.convs[("upconv", i, 0)](x)
x = [upsample(x)]
if self.use_skips and i > 0:
x += [input_features[i - 1]]
x = torch.cat(x, 1)
x = self.convs[("upconv", i, 1)](x)
if i in self.scales:
self.outputs[("disp", i)] = self.sigmoid(self.convs[("dispconv", i)](x))
if self.use_uncert:
self.outputs[("uncert", i)] = self.convs[("uncertconv", i)](x)
return self.outputs
1) I am training on the eigen zhou split of KITTI dataset (monodepth2 default) 2) I am training the M model.
Everything looks good. I'll try to give a look at it next week
Thanks :) Would be waiting for yor inputs
I launched a single train and it ended without issues. I'll try a few more times
Okay..Let me know how it goes..
Hi,
Wonderful work, and thanks for sharing the code. I'm working on training the model with log loss to estimate uncertaitny. But, i'm facing the exploding gradient issue.
Have you fixed the exploding gradient issue with log_loss ?
Thanks !
Hi, sorry for late. Are you trying to estimate log uncertainty as we mentioned in the previous comments? Among them, we also mentioned using a sigmoid in place of modeling the log uncertainty (https://github.com/mattpoggi/mono-uncertainty/issues/13#issuecomment-761558919). I used this in some following up works and seems extremely stable, yet giving equivalent results.
@kishore-greddy @IemProg one of the reasons might be the batch size you're using. I had a similar experience in another framework where the training goes to instability if you use small batch size (like 1 or 2). If you use a different batch size than the one used in the paper that might be the issue.
@mattpoggi could you please confirm this by trying to set the training batch size to 1 and see if you experience exploding/vanishing gradients?
Hey @mattpoggi ,
I was trying to train the log model. I made necessary changes to the decoder to include the additional channel. When I start training, the intial loss is NaN and then after some batches it is NaN again. I was debugging the issue and stumbled upon this piece of code from your decoder.py
1) In line 81, sigmoid is used as the original code from monodepth2, but I do not see sigmoid being used for uncerts in line 85, Is there any reason for this?
2) I train on the GPU, but for debugging I use the CPU. While debugging on my CPU with batch_size 2 (any size greater will cause memory issues), I used breakpoints to see the values of uncert.
As seen in the image, the min value is negative, Log of a negative number is NaN. This made me ask the first question, why the uncerts are not clamped between 0(possibly a tiny bit greater to avoid inf when log is taken in the loss function) and 1. Is my understanding right or have I misunderstood something?
3) My loss function is
EDIT : After reading quite a lot, I feel that my log loss is wrong. Maybe the uncertainities coming at the output channel are already \log(uncertainties) , so I would have to correct my loss function to below?
EDIT 2: Would the above edit hold good for the self teaching loss too, meaning the uncertainity outputs are actually the \log(uncertainties), so I have to take torch.exp() in the loss.?
Thanks in advance