lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.42k stars 806 forks source link

segmentation fault issue depending on hyperparameter #399

Open sbyoo opened 4 years ago

sbyoo commented 4 years ago

Hello.

I was facing weird occasions where UMAP works perfectly for a set of hyperparameter while another set of hyperparameter causing error exhibiting 'segmentation fault'. It is not due to data itself nor virtualenv since they were identical. FYI, data is about (500,000 x 39) numpy array.

The versions of packages are:

The functioning hyperparameter set is:

The malfunctioning hyperparameter set it -n_neighbor: 70, 90, 110 (I did not search all yet) -min_dist: 0.005 to 0.5 (these are identical from above) -metric: correlation -dimension:2

Just to add, the OS system is Linux Mint 18.4. I tested both on the command line by .py and Jupyter lab.

If there are any guess why it might happen, please let me know. Thank you.

lmcinnes commented 4 years ago

I'm guessing that there is an odd corner case bug in the correlation metric -- that would be the most likely case. I'll have a look tomorrow and see if I can find a cause.

sbyoo commented 4 years ago

Indeed. I forgot to mention it seemed to work in the Euclidean metric but not in correlation metric. Thank you for the quick response.

Note) The Mahalanobis distance also provides identical errors: segmentation fault.

lmcinnes commented 4 years ago

I'm not seeing anything obvious. Is this a dataset that you are able to share? (I certainly understand that this is often not the case). If so I can try to reproduce this locally and figure out what the issue might be.

sbyoo commented 4 years ago

@lmcinnes. Thank you for quickly reaching me after testing it. Three things to report: 1. In the same setup, if I reduce the size of the data and run on the malfunctioning hyperparameter set, it worked.

  1. I created totally new environment in conda and installed python 3.6.5 and some of the hyperparameters that did not work before are working (not all of them). Difference other than python version is scipy == 1.4.1 and scikit-learn==0.22.1.
  2. Sorry that I could not try with my old mac by the error message 'low memory' (it has 8GB RAM).

For the data, I will have to ask my PI whether it is fine to share data for this purpose (I think he will be positive). I will send you an email on how to share the data.