Closed joel99 closed 5 years ago
Hello. We have recently noticed this behaviour as well - we pinpointed the problem to a fault in the SfM / MVS estimated correspondences. Some of the scenes have cameras with a significant distortion which makes the depth maps not align well with the raw images. Up until recently, we didn't have access to the camera parameters of the undistorted images which made it impossible to use them directly.
Nevertheless, we were able to recover accurate enough undistorted models and we are planning on making them available for download during the following weeks. From our experiments, this doesn't change the results reported in the paper, but it makes the entire training process more stable and the network converges significantly faster!
Regarding your second question, the weights are not randomly initialized - we are using the Caffe / TF weights which are loaded from d2_ots.pth
.
I'm leaving this issue open in the meanwhile in case other people run into a similar problem.
Hi, I also have the same question, I'm wondering what's d2_ots.pth? Referring to the paper, it says it just load pre trained VGG16 and the network can be trained, it didn't mentioned about d2_ots.pth. Thanks
d2_ots.pth
are the Caffe / Keras / TensorFlow weights pretrained on ImageNet. The original file can be downloaded, for instance, by using https://keras.io/applications/#vgg16 - we simply converted it to PyTorch format. These weights are different from PyTorch ones due to a different normalization procedure (see https://github.com/mihaidusmanu/d2-net/blob/master/lib/utils.py#L10).
Since our initial implementation was in Keras / TF, we ported the off-the-shelf weights to PyTorch to get the same fine-tuning results.
Out of curiosity, does d2_ots differ much from the return of models=vgg16(pretrained=True)
?
The PyTorch authors re-trained internally from scratch some ImageNet models before releasing them. Even though the final ImageNet classification performance is roughly the same, due to the different preprocessing and data augmentation, the weights appear to be significantly different (at least when running a simple element-wise comparison). We suspect that one of the reasons for this is also the lack of BatchNorm in the original VGG architecture but we did not investigate this issue further.
To sum it up:
torchvision.models.vgg16(pretrained=True)
returns some weights that work on images preprocessed following this procedure: The images have to be loaded in to a range of [0, 1]
and then normalized using mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
- see https://pytorch.org/docs/stable/torchvision/models.html
keras.applications.vgg16.VGG16(weights='imagenet')
returns the d2_ots.pth
weights that work on images preprocessed as follow: The images are converted to BGR and centered using mean = [103.939, 116.779, 123.68]
- see https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py#L125
Hello @joel99. Sorry for the delay. I just updated the repository with the fix that I mentioned previously. Hopefully, it should solve the problems you ran into when fine-tuning on a subset of MegaDepth. Feel free to open a new issue in case you still run into issues!
Hi,
I was wondering if you had insight into why I run into NaNs while training on a subset (from 10-50 scenes) of Megadepth data - specifically, training as suggested in the readme on the 113 ish MD scenes works as expected. However, when the only change is training set size, I get 0s appearing in
depth_wise_max
inmodel.py
, causing the score to get a div0 NaN. Is this a symptom of overfitting or something like that?Also, are weights randomly initialized? In this repo, even in the extractor module, when you pull down the vgg head, it seems there's no
pretrained=True
flag.