mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
782 stars 164 forks source link

Unstable training on subset of megadepth #11

Closed joel99 closed 5 years ago

joel99 commented 5 years ago

Hi,

I was wondering if you had insight into why I run into NaNs while training on a subset (from 10-50 scenes) of Megadepth data - specifically, training as suggested in the readme on the 113 ish MD scenes works as expected. However, when the only change is training set size, I get 0s appearing in depth_wise_max in model.py, causing the score to get a div0 NaN. Is this a symptom of overfitting or something like that?

Also, are weights randomly initialized? In this repo, even in the extractor module, when you pull down the vgg head, it seems there's no pretrained=True flag.

mihaidusmanu commented 5 years ago

Hello. We have recently noticed this behaviour as well - we pinpointed the problem to a fault in the SfM / MVS estimated correspondences. Some of the scenes have cameras with a significant distortion which makes the depth maps not align well with the raw images. Up until recently, we didn't have access to the camera parameters of the undistorted images which made it impossible to use them directly.

Nevertheless, we were able to recover accurate enough undistorted models and we are planning on making them available for download during the following weeks. From our experiments, this doesn't change the results reported in the paper, but it makes the entire training process more stable and the network converges significantly faster!

Regarding your second question, the weights are not randomly initialized - we are using the Caffe / TF weights which are loaded from d2_ots.pth.

I'm leaving this issue open in the meanwhile in case other people run into a similar problem.

HencyChen commented 5 years ago

Hi, I also have the same question, I'm wondering what's d2_ots.pth? Referring to the paper, it says it just load pre trained VGG16 and the network can be trained, it didn't mentioned about d2_ots.pth. Thanks

mihaidusmanu commented 5 years ago

d2_ots.pth are the Caffe / Keras / TensorFlow weights pretrained on ImageNet. The original file can be downloaded, for instance, by using https://keras.io/applications/#vgg16 - we simply converted it to PyTorch format. These weights are different from PyTorch ones due to a different normalization procedure (see https://github.com/mihaidusmanu/d2-net/blob/master/lib/utils.py#L10).

Since our initial implementation was in Keras / TF, we ported the off-the-shelf weights to PyTorch to get the same fine-tuning results.

joel99 commented 5 years ago

Out of curiosity, does d2_ots differ much from the return of models=vgg16(pretrained=True)?

mihaidusmanu commented 5 years ago

The PyTorch authors re-trained internally from scratch some ImageNet models before releasing them. Even though the final ImageNet classification performance is roughly the same, due to the different preprocessing and data augmentation, the weights appear to be significantly different (at least when running a simple element-wise comparison). We suspect that one of the reasons for this is also the lack of BatchNorm in the original VGG architecture but we did not investigate this issue further.

To sum it up:

mihaidusmanu commented 5 years ago

Hello @joel99. Sorry for the delay. I just updated the repository with the fix that I mentioned previously. Hopefully, it should solve the problems you ran into when fine-tuning on a subset of MegaDepth. Feel free to open a new issue in case you still run into issues!