src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
806 stars 145 forks source link

Issues running in fp16 #34

Open philippeller opened 6 years ago

philippeller commented 6 years ago

Even though the fp16 unit tests run successfully, I cannot get clustering to work on my sample in fp16 mode.

When is use my data .astype(np.float32) the clustering output looks sth. like:

data loaded
reassignments threshold: 50000
transposing the samples...
performing kmeans++...
done
too few clusters for this yinyang_t => Lloyd
iteration 1: 1000000 reassignments
iteration 2: 205760 reassignments
iteration 3: 103270 reassignments
iteration 4: 70388 reassignments
iteration 5: 53987 reassignments
iteration 6: 43521 reassignments
clustering done

Wuth the exact same data but .astype(np.float16) however i get:

iteration 1: 1000000 reassignments
iteration 2: 727174 reassignments
iteration 3: 691319 reassignments
iteration 4: 676380 reassignments
iteration 5: 663719 reassignments
iteration 6: 657045 reassignments
iteration 7: 648330 reassignments
iteration 8: 644405 reassignments
iteration 9: 645016 reassignments
iteration 10: 639175 reassignments
iteration 11: 636870 reassignments
iteration 12: 637613 reassignments
iteration 13: 623391 reassignments
iteration 14: 626752 reassignments
iteration 15: 635901 reassignments
iteration 16: 639952 reassignments
iteration 17: 640327 reassignments
iteration 18: 639718 reassignments
iteration 19: 652200 reassignments
iteration 20: 662045 reassignments
iteration 21: 676625 reassignments
iteration 22: 701241 reassignments
iteration 23: 693161 reassignments
iteration 24: 713261 reassignments
iteration 25: 709713 reassignments
iteration 26: 717198 reassignments
iteration 27: 735580 reassignments
iteration 28: 743289 reassignments
iteration 29: 745265 reassignments
iteration 30: 761803 reassignments
iteration 31: 762372 reassignments
iteration 32: 779398 reassignments
iteration 33: 778028 reassignments
iteration 34: 781619 reassignments
iteration 35: 786291 reassignments
iteration 36: 792249 reassignments
iteration 37: 799609 reassignments
iteration 38: 804822 reassignments
iteration 39: 799691 reassignments
iteration 40: 804918 reassignments
iteration 41: 823586 reassignments
iteration 42: 810885 reassignments
iteration 43: 827755 reassignments
iteration 44: 833164 reassignments
[...]

It does not converge....

I also tested to cast the data to fp16 and then back to fp32 to loose precision on the dataset, but that still converged fine.

Any ideas?

(I tried on Tesla P100 as well as Titan X and 1080 pascal cards)

vmarkovtsev commented 6 years ago

Handling fp16 is complex, and it may not work with samples in diverse value ranges - this is the consequence of the precision loss. It certainly works for some cases which I tested; I need the data to conclude that there is nothing to be done or there is a hardcore calculation bug.

philippeller commented 6 years ago

thanks for the prompt answer! I'd be more than happy to provide you a .npy file containing the data (or a subset thereof) ....i could upload it to a github repo for example?

philippeller commented 6 years ago

I have uploaded two small test data sets: they are identical data but in single and half precision. While the fp32 set converges in 5 iterations, the fp16 does not converge even after 100 iterations

fp16: https://github.com/philippeller/retro/blob/directionality_phiipp/table_compression/testdata_fp16.npy?raw=true fp32: https://github.com/philippeller/retro/blob/directionality_phiipp/table_compression/testdata_fp32.npy?raw=true

vmarkovtsev commented 6 years ago

Perfect, I will have a look in the following days.