Closed stgzr closed 5 years ago
Hi @stgzr Here are some ideas may help:
hnsw
sampler. It is much quicker and accurate. Both of these algorithms provide a trade-off between computation and accuracy, e.g., it can further accelerate by setting post strategy of hnsw to 0.Thank you for your reply! @yl-1993
It Is very helpful, I will try later. So is there anything different like precision between hnsw
and annoy
implementation while keeping same setting for HF?
Another question: I trained my neural network using a moderate large batch(e.g. 2k), then I found the forward part of "selecting active class"(get_nns_by_vector()
loop) also costed time(~500ms by 16 processes). Because I use the selective softmax to reduce the large cost of FC layer multiplication(e.g. 500k classification), while it seems introduce another huge cost which is not counteracting the original FC multiplication. Do you have any ideas within this condition?
Thank you very much!
@stgzr Thanks for trying.
hnsw
and annoy
can be viewed as a NN-search (nearest neighbor) algorithm. Better performance in NN-search usually leads to better performance in selective softmax training.@yl-1993 Thank you! So it needs more efforts to implement a high performance selective softmax
based on your open source code, I will consider your advice carefully.
My last question is about the accuracy, it seems the performance(HF-A) decrease 0.8% compared to full-softmax according to the Table.2 in the paper. I understand this little drop is possible because we just select several active classes
. I wonder if there is a way to get closer to the full-softmax accuracy despite that HF or NN-search accuracy while keep low computational cost.
I think the key to selective softmax
is how to find the active classes. So do you ever try any other methods like unsupervised clustering? By the way, I think the idea in massive classification task to reduce FC computational cost is very useful but the relative papers seem too few, so do you have any plans about future work?
@stgzr Thanks for delving into the details of our paper.
Hi, when the number of classes is nearly 100k(MS-Celeb-1M), i think annoy could build hash forest in an acceptable time. But as the number of classes becomes larger, considered as a "massive classification", nearly 1 million, the HF building time seems unacceptable(I tried annoy building a 1 million features and set n_trees = 100, feat_dim = 2048, it costs ~22 minutes). So is there a solution, or do you have any advice? Thank you!