Closed alberto-sibner closed 5 years ago
Any ideas about in which situations this might happen?
Thank you very much
EDIT:
As an additional note, during the training stage the following Warning arises
\venv\lib\site-packages\hdbscan\prediction.py:547: RuntimeWarning: invalid value encountered in double_scalars
clusterer.prediction_data_.cluster_tree)
I could also share my array of latitudes and longitudes if it was necessary.
EDIT 2:
I also noticed that there are points whose vectors of probabilities sum 0 (i.e. do not belong to any clusters either) when they shouldn't
Hi @lmcinnes
Any ideas about why this might be happening?
Thanks in advance
At this point I must admit that I am not sure quite why the soft cluster membership functions fail -- it has been a while since I wrote them, and there seem to be odd corner cases that trigger behaviour, but I can rarely reproduce it, so while afew things have been fixed, niggling issues remain, for which I don't really have any ideas.
@lmcinnes Oh, I see what you mean... Fair enough! Do you think I could help you fixing this issue? Soft clustering is very important for the problems I need to solve and hdbscan is the algorithm that work best for me after having tried a lot.
I have the exact parameters that trigger this behaviour for the dataset I use (and also a trained model if you want). Could I share them with you privately and help you debugging this?
Thank you very much!
I've managed to solve it. In my case, it seems the problem was due to having a few points duplicated in the long lists I wanted to find clusters in.
Hi,
I'm using hdbscan with haversine metric to find clusters based on latitudes and longitudes. The algorithm works really well for me. However, when I use
all_points_membership_vectors
andmembership_vectors
with some coordinates these methods return nan probabilities.In other words, I have N points in my dataset and they are all classified quite well using soft clustering. Although for a small part of these N points I get NAN probabilities of them belonging to any of the clusters.
I have checked some of these points individually and they seem totally normal for me, they are surrounded by a lot of clusters in the map and yet they don't have probabilities of belonging to any of them.