Use of VGG16 pretrained weights

phseo commented 4 years ago

Hi Mihai Dusmanu,

Great work! I enjoyed your paper and thank you for releasing your codes too. While I am trying to reproduce your model by training by myself, I realized that your code in lib/model.py do not set the 'pretrained=True' when calling models.vgg16(). Is this intended or a bug? Are you finetuning only the last layer only while randomly initializing the weights for the previous layers?

While I manually set it to be true, my loss continuously reaches to NaN value. Debugging it, I found that some of division by max values in score computation actually causes dividing by zero. Haven't you face this issue? Could you tell me the best practice to reproduce your learned feature please?

Thanks, Paul

mihaidusmanu commented 4 years ago

Hello. Our initial implementation was in TensorFlow so we ported the ImageNet weights from TensorFlow to PyTorch (d2_ots.pth). As you can see in the training script, we load these weights into the network so there's no need to set pretrained=True.

https://github.com/mihaidusmanu/d2-net/blob/e8da4d0533cf8427ad566f69486d14880f6fac5c/train.py#L115-L119

https://github.com/mihaidusmanu/d2-net/blob/e8da4d0533cf8427ad566f69486d14880f6fac5c/lib/model.py#L95-L96

The two sets of weights (TF / Caffe vs. PyTorch) are completely different (different normalization and data augmentation) so I don't expect them to perform the same / have the same behavior.

In case you are running into NaN issues with the default version of the script (i.e. no changes to parameters / etc), please let me know. Last time I tried to train the network was on PyTorch 1.0 and some things have changed since then.

Also, please note that if you are trying to fine-tune the PyTorch weights, you should also change the normalization to the PyTorch one instead of the Caffe one using --preprocessing torch at train time and comment out L117 from train.py.

Regards, Mihai.

phseo commented 4 years ago

Hi Mihai,

Thanks for your answer. I am migrating your code into another type of framework. In that framework, I am using torch pretrained version and I already maintained the image preprocessing correspondingly. So in my understanding, although your code has --preprocessing flag, you have not tested the code with the pytorch weights, right?

I've had NaN issues and now after adding some epsilons to all divisions (I think around 5 different places) they are gone. However, I see the training very unstable and not converging well. I will then test the code with the original implementation.

Best, Paul

mihaidusmanu commented 4 years ago

Hello. Yes, you are right - I didn't try to fine-tune the PyTorch weights since their off-the-shelf performance was worse than that of Caffe / TF weights. I suspect that this is due to the data augmentation techniques used, but we did not investigate this issue further.

In order for our training methodology to work, the initial weights should already be reliable enough to establish some good matches (low margin loss). One suggestion that I might have in your case is to first fine-tune the PyTorch weights with the margin loss only (i.e. without the detection term, a dense-matching scenario) and after a few epochs switch to the soft-score weighted margin loss from the D2-Net paper.

mihaidusmanu commented 4 years ago

Hello. I don't have any issues fine-tuning the last layer of the PyTorch weights on the full MegaDepth dataset; I only tried for 2 epochs but everything looked good so far - didn't even need to add 1e-8 to the denominators.

In case you were fine-tuning on a subset of MegaDepth, you should be careful with the scenes you chose: some of them have quite bad depth maps which yield absurd correspondences - a bad supervision signal that might cause network instability. For best results, they should probably be filtered out, but there are a lot of image pairs / scenes so it is quite a time-consuming task.

phseo commented 4 years ago

Thank you for your answers and sorry for the late reply. I have also tried it and finetuning the last layer was fine with the batch size of 1. But having the batch size of 256 often raised NaN error as sometimes it faces 0 denominators and adding epsilon to the denominator solved this problem entirely. Thank you for all your help!

mihaidusmanu commented 4 years ago

Great. Thanks for letting me know! I will make the change in a future commit.

mihaidusmanu / d2-net

Use of VGG16 pretrained weights #25