Closed jlian2 closed 3 years ago
I may kind of misunderstand symbols. But I am still confused. For proxy loss, training complexity is O(M/BCB)=O(MC), for triplet, it is O((M/B)^3 * B) = O(M^3/B^2) rather than O(M^3), any clarification on that?
I think the complexity analysis of other loss functions is fuzzy in paper...
Training complexity O(MC), M is batch size, C is number of classes. For most data sets like imagenet, C is much larger than M, which is to say, MC > M^2 or even MC > M^3, how do you explain this?
Look at the orginal Proxy NCA paper, it accent the point that it will take O((N/b)^3) steps to consider all samples while it does not mean O(b^3) is a large number, they are totally different thing. I think it is meaningless to illustrate that O(MC) is better than O(M^3).