src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
807 stars 146 forks source link

weird problem #54

Open lijiawenl opened 6 years ago

lijiawenl commented 6 years ago

The data i used is [183497*600] , i set the k = 4590 and i get the error : 'internal bug inside kmeans_init_centroids: dis_num is NaN ' '/src/kmeans.cu:814->'an illegal memory encounted' 'cudaMempy failed' . but i set the k equal other number, like k = 4591, the kmeans works. - -! i thought this might occur randomly I have no idea where the problem is.

vmarkovtsev commented 6 years ago

Check your input for NaNs. If there are too many - they creep into centroids.

vmarkovtsev commented 6 years ago

Otherwise I need your data to reproduce the problem.

lijiawenl commented 6 years ago

Thanks for your reply. you mean the data i used have NaNs, but it does not make sense that when i set k=4591, it works. Could you send me a email-address that i can send the data to you.

lijiawenl commented 6 years ago

Many thanks.

vmarkovtsev commented 6 years ago

There is a certain tolerance of NaNs during the centroids initialization but it is not bulletproof.

My email is written on my profile page.

georgandreasjaksch commented 6 years ago

I seem to have a similar problem: the centroids randomly contain NaNs although my data doesn't. Repeating the clustering over and over again until no NaNs are in the centroids anymore kind of solves the issue. (350K, 15 dim, around k=100 clusters)

Is there any progress on this issue?