shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
208 stars 53 forks source link

IndexError in easy_example.py #19

Closed sxie22 closed 4 years ago

sxie22 commented 5 years ago

Running easy_example.py without any changes results in an IndexError at line 74 in DCCComputation.py (error below). This seems to arise because the largest epsilon is greater than NOISE_THRESHOLD but smaller than DIM*NOISE_THRESHOLD.

https://github.com/shahsohil/DCC/blob/d918a89be020eb87d5893ff3591e08daa84422c4/pytorch/DCCComputation.py#L66

pytorch version: 1.3.0.dev20190819 numpy version: 1.16.4 scipy version: 1.2.1

Loaded `easy` dataset for finetuning
/home/sxie22/miniconda3/envs/sisso/lib/python3.7/site-packages/numpy/lib/function_base.py:392: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/home/sxie22/miniconda3/envs/sisso/lib/python3.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in true_divide
  ret = ret.dtype.type(ret / rcount)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/PycharmProjects/vanDuin/DCC/pytorch/easy_example.py in <module>
     86 args.M = 20
     87 args.lr = 0.001
---> 88 out = DCC.main(args, net=net)

~/PycharmProjects/vanDuin/DCC/pytorch/DCC.py in main(args, net)
    108 
    109     # computing and initializing the hyperparams
--> 110     _sigma1, _sigma2, _lambda, _delta, _delta1, _delta2, lmdb, lmdb_data = computeHyperParams(pairs, Z)
    111     oldassignment = np.zeros(len(pairs))
    112     stopping_threshold = int(math.ceil(cfg.STOPPING_CRITERION * float(len(pairs))))

~/PycharmProjects/vanDuin/DCC/pytorch/DCCComputation.py in computeHyperParams(pairs, Z)
     72     robsamp = min(cfg.RCC.MAX_NUM_SAMPLES_DELTA, robsamp)
     73     _delta2 = float(np.average(epsilon[:robsamp]) / 2)
---> 74     _sigma2 = float(3 * (epsilon[-1] ** 2))
     75 
     76     _delta1 = float(np.average(np.linalg.norm(Z - np.average(Z, axis=0)[np.newaxis, :], axis=1) ** 2))

IndexError: index -1 is out of bounds for axis 0 with size 0

In [2]: debug                                                                                        
> /home/sxie22/PycharmProjects/vanDuin/DCC/pytorch/DCCComputation.py(74)computeHyperParams()
     72     robsamp = min(cfg.RCC.MAX_NUM_SAMPLES_DELTA, robsamp)
     73     _delta2 = float(np.average(epsilon[:robsamp]) / 2)
---> 74     _sigma2 = float(3 * (epsilon[-1] ** 2))
     75 
     76     _delta1 = float(np.average(np.linalg.norm(Z - np.average(Z, axis=0)[np.newaxis, :], axis=1) ** 2))

ipdb> epsilon = np.linalg.norm(Z[pairs[:, 0].astype(int)] - Z[pairs[:, 1].astype(int)], axis=1)      
ipdb> epsilon = np.sort(epsilon)                                                                     
ipdb> epsilon[-1]                                                                                    
0.011799936
ipdb> np.sqrt(cfg.DIM)                                                                               
3.1622776601683795
ipdb> cfg.DIM                                                                                        
10
shahsohil commented 4 years ago

@LemonPi We used noise threshold from RCC. And it was never tested rigorously on any synthetic data (easy_example). They were only set using empirical evaluation on subset of MNIST data.

My only suggestion is to change noise threshold in accordance with synthetic data.

wangxinzhi0 commented 3 years ago

Hello,

I have a problem when run easy_example.py, it can't find the module named 'easydict'. Did you encounter this problem and how did you solve it?

ModuleNotFoundError: No module named 'easydict'