Why use jaccard_dist to cluster?

SunskyF commented 4 years ago

In MMT and SSG, both use re-rank distance to cluster, both use source domain features to compute Jaccard distance. I want to know the performance of using original distance and no source feature distance, if possible. Thanks!

yxgeee commented 4 years ago

Hi, thank you for your question.

I did not use DBSCAN-based MMT in our paper, so I have not conducted complete ablation study on the implementation details about DBSCAN. Specifically, I adopted K-Means in the paper with original distance and only target-domain features.

When conducting experiments with DBSCAN recently, I referred to the clustering step in SSG for fair comparison. I guess SSG referred to https://github.com/LcDog/DomainAdaptiveReID.

And I found that the performance drops significantly when directly using original distance in DBSCAN. I did not use the source-domain features for clustering, i.e. lambda_value=0 in the training script. I am not sure whether it will improve the performance when combining with source-domain features, i.e. setting lambda_value>0. You can try it yourself.

About results, in my current settings (jaccard distance with only target-domain features), DBSCAN-MMT performs slightly better than KMeans-MMT on duke-to-market task (+ 2-3% mAP) and performs slightly worse than KMeans-MMT on market-to-duke task (- 1-2% mAP).

SunskyF commented 4 years ago

Thanks for your quick and detailed reply! I'm sorry that I didn't notice lambda-value = 0. I guess jaccard distance adds some prior information to help cluster. To some extend, using jaccard distance optimizes the model from a global view (whole dataset). Again, thanks for your reply and your awesome work..

yxgeee / MMT

Why use jaccard_dist to cluster? #14