pierre-jacob / ICCV2019-Horde

Code repository for our paper entilted "Metric Learning with HORDE: High-Order Regularizer for Deep Embeddings" accepted at ICCV 2019.
MIT License
83 stars 10 forks source link

Training process #6

Closed mhyeonsoo closed 4 years ago

mhyeonsoo commented 4 years ago

Hello,

Thanks for a great sources. I am trying to test the code with CUB dataset now.

I succeeded to run training without error, but now I am not very sure about the training process. What I think is, using pretrained backbone and cascade Horde layer, training has been going on.

But I can see R@K for each output fluctuates along with the epochs. I am wondering if it is because of pretrained backbone model or just normal case of Horde.

Thank you.

pierre-jacob commented 4 years ago

Hey, Thanks for the kind words!

If you have run the script, or the “train.py” script, you should see in your command line something like that:

Recall@1 has been increased (59.0 --> 62.7). Metrics for L2_O1: L2_O1 - Recall@1: 62.7 L2_O1 - Recall@2: 74.5 L2_O1 - Recall@4: 83.8 L2_O1 - Recall@8: 90.0 L2_O1 - Recall@16: 93.9 L2_O1 - Recall@32: 96.7

Metrics for L2_O2: L2_O2 - Recall@1: 62.9 L2_O2 - Recall@2: 74.6 L2_O2 - Recall@4: 83.3 L2_O2 - Recall@8: 90.0 L2_O2 - Recall@16: 94.1 L2_O2 - Recall@32: 96.6

Metrics for L2_O3: L2_O3 - Recall@1: 59.9 L2_O3 - Recall@2: 71.5 L2_O3 - Recall@4: 81.4 L2_O3 - Recall@8: 88.0 L2_O3 - Recall@16: 93.0 L2_O3 - Recall@32: 96.1

Which means you have successfully started the training of HORDE, congratz 😉

In practice, you have loaded a pre-trained model (GoogleNet or BNInception), added a single layer to build the embedding and a L2 norm above, for the main part of the metric learning. Also, you have added multiple high-order layers in order to compute the HORDE regularization. The scores above are the retrieval performances, evaluated for each order: in general O{K} is the score for the K-th order. You should monitor "O1", which is the standard metric learning output.

As you observed, they tend to fluctuate with the epochs. This is mainly because you are optimizing kind of different objectives: the mean for the performances, the second-order for the deep feature variance, and so on. Sometimes, they might have opposite roles, e.g., minimizing the variance decreases the performances of the first order because an outlier has been pushed much closer to its mean. Usually, training few epochs more increases both performances, and they stabilize around the best performances.

Let me know if you have any difficulties to train the model!

mhyeonsoo commented 4 years ago

thanks for the perfect feedback.