kanezaki / pytorch-rotationnet

BSD 2-Clause "Simplified" License
80 stars 19 forks source link

About the order of views in trainning #5

Closed XHQvi closed 5 years ago

XHQvi commented 5 years ago

Hello. In your code, the fellow code is used to permutate samples. These codes will shuffle samples while keep the order of views in a sample unchanged.

        inds = np.zeros( ( nview, train_nsamp ) ).astype('int')
        inds[ 0 ] = np.random.permutation(range(train_nsamp)) * nview
        for i in range(1,nview):
            inds[ i ] = inds[ 0 ] + i
        inds = inds.T.reshape( nview * train_nsamp )
        train_loader.dataset.imgs = [sorted_imgs[ i ] for i in inds]
        train_loader.dataset.samples = train_loader.dataset.imgs

Are the different orders of views corresponding to different "vcands"? If so, I think, keeping order of views unchanged may cause imbalance between classifers. And the same situation can also appear in testing. I'm quite confused about it. Is there some reasons to keep the order unchanged? Thank you !

kanezaki commented 5 years ago

Hi there,

Are the different orders of views corresponding to different "vcands"?

Yes, "vcands" stores all the candidate view orders, all of which are investigated in l.312-l.315. Then the best pose (= view order) is determined according to the scores in l.318.

Is there some reasons to keep the order unchanged?

In our method, relative positions of views must be unchanged. (In the case (i) setting, for example, the relative azimuth angle between the i-th view and i+1-th view should be 30 degrees.)

XHQvi commented 5 years ago

Thank you for your reply! But I still can't understand the intuition in that.

For example, in case (i), a sample xi = {v1, v2, ..., v12} willl be feed into the network during the training phase, and the input order of {vi} will keep unchanged. The input order of {vi} is corresponding to one specific "vcand", which means the sample xi only make contributions to the training of the specific "vcand" .

It's possible that all samples are only used for training several "vcand"s (or the classifiers corresponding to the "vcand"s) . And the other "vcand"s (or those classifiers) will be trained poorly. So when I shuffle the views of a test sample, the network may perform poorly because some "vcand"s are not trained finely.

So, why don't shuffle the views in each epoch and train all "vcand"s finely ? Or maybe I have some misunderstanding about how it works. And looking forward to your reply. Thanks!

kanezaki commented 5 years ago

I think I understood your point. Although the input order of multi-view images is fixed, the output scores are rearranged according to each permutation stored in the "vcand" variable (see l.312-l.315). The "vcand" variable stores all the twelve candidates of view permutations in case (i). Then in l.318, the best one among the twelve candidates is selected so that the classifiers are updated based on the selected permutation.

The same strategy of selecting the best pose is applied in the testing phase (see l.459-466). This is how our method works - the classifiers are responsive to one specific pose (i.e., view permutation) among all the candidates.

XHQvi commented 5 years ago

Okay, I got it. By the way, do you find any reason why different camera settings make the top1 accuracy so different ? In your paper, you tried 11 schemes of camera settings in case(ii), and the results are quite different. Thanks !

kanezaki commented 5 years ago

Good question, I think that's because there are only 20 discrete viewpoints in each camera setting and thus some camera setting happens to capture more discriminative images than others.

XHQvi commented 5 years ago

Ok, Thank you for your reply!