yxgeee / MMT

[ICLR-2020] Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.
https://yxgeee.github.io/projects/mmt
MIT License
472 stars 73 forks source link

Error in DBSCAN #21

Closed absagargupta closed 4 years ago

absagargupta commented 4 years ago

Hello there. I was running dbscan on Sysu dataset and some of the results I got from manipulating Sysu. I am encountering some error. watch -d -n 0.5 nvidia-smi

Full call back is as such sh scripts/train_mmt_dbscan.sh Sysu Sysuresults resnet50

Args:Namespace(alpha=0.999, arch='resnet50', batch_size=64, data_dir='/home/sagar18174/Thesis/person_RE-identification/MMT/examples/data', dataset_source='Sysu', dataset_target='Sysuresults', dropout=0.0, epochs=40, eval_step=1, features=0, height=256, init_1='logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar', init_2='logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar', iters=400, lambda_value=0.0, logs_dir='logs/SysuTOSysuresults/resnet50-MMT-DBSCAN', lr=0.00035, momentum=0.9, num_instances=4, print_freq=1, rr_gpu=False, seed=1, soft_ce_weight=0.5, soft_tri_weight=0.8, weight_decay=0.0005, width=128, workers=4)

=> Market1501 loaded Dataset statistics:

subset | # ids | # images | # cameras

train | 500 | 14870 | 2 query | 500 | 745 | 2 gallery | 486 | 486 | 1

=> Market1501 loaded Dataset statistics:

subset | # ids | # images | # cameras

train | 259 | 11801 | 2 query | 259 | 518 | 2 gallery | 259 | 259 | 1

/home/sagar18174/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:26: UserWarning: There is an imbalance between your GPUs. You may want to exclude GPU 0 which has less than 75% of the memory or cores of GPU 1. You can do so by setting the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES environment variable. warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos])) => Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar' mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048]) missing keys in state_dict: set(['module.classifier.weight']) mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048]) missing keys in state_dict: set(['module.classifier.weight']) => Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar' mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048]) missing keys in state_dict: set(['module.classifier.weight']) mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048]) missing keys in state_dict: set(['module.classifier.weight']) Extract Features: [50/185] Time 0.105 (0.196) Data 0.000 (0.011) Extract Features: [100/185] Time 0.106 (0.148) Data 0.000 (0.006) Extract Features: [150/185] Time 0.106 (0.133) Data 0.000 (0.004) Extract Features: [50/185] Time 0.102 (0.115) Data 0.000 (0.011) Extract Features: [100/185] Time 0.123 (0.111) Data 0.015 (0.006) Extract Features: [150/185] Time 0.106 (0.109) Data 0.000 (0.005) Computing original distance... Computing Jaccard distance... Time cost: 154.446565151 examples/mmt_train_dbscan.py:182: RuntimeWarning: Mean of empty slice. eps = tri_mat[:top_num].mean() /home/sagar18174/.local/lib/python2.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in true_divide ret = ret.dtype.type(ret / rcount) eps for cluster: nan Clustering and labeling... Traceback (most recent call last): File "examples/mmt_train_dbscan.py", line 304, in main() File "examples/mmt_train_dbscan.py", line 130, in main main_worker(args) File "examples/mmt_train_dbscan.py", line 187, in main_worker labels = cluster.fit_predict(rerankdist) File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan.py", line 354, in fit_predict self.fit(X, sample_weight=sampleweight) File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan.py", line 322, in fit **self.getparams()) File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan.py", line 127, in dbscan raise ValueError("eps must be positive.") ValueError: eps must be positive.

Any help is highly appreciated. Sagar

yxgeee commented 4 years ago

It seems that the values in rerank_dist are all zeros. Please double-check it, especially in the function https://github.com/yxgeee/MMT/blob/aeb547079c65d9aa2b8ce08d587970412d362b07/mmt/utils/rerank.py#L106

yxgeee commented 4 years ago

Hi, have you solved it? If not, it is recommended to provide more details about your modifications to the new dataset, which is helpful for more suggestions. I have added the "help" label to your issue so that more people may notice your problem and help you.

absagargupta commented 4 years ago

Hello there. Thank you for adding the label.

So the dataset that I made is using the results of cyclegan when applied on Sysu-RGB-IR dataset. I am trying to run the dbscan on the actual sysu dataset and the generated images that I have as the results of the cyclegan. I am able to run the train_mmt_kmeans.sh without any problem but while running dbscan on the same Sysu dataset and its result I am facing such error.

One more thing. When can the values in rerank dist equal to 0 ?

yxgeee commented 4 years ago

Generally, the values in rerank_dist cannot be all zeros. There may exist some bugs if such a situation appears. Did you check the values in rerank_dist? Are they all zeros?

Plus, your logs said examples/mmt_train_dbscan.py:182: RuntimeWarning: Mean of empty slice. eps = tri_mat[:top_num].mean(), have you checked the value of top_num?