src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
783 stars 144 forks source link

The Program stucks on #125

Open qiaoyu1002 opened 1 year ago

qiaoyu1002 commented 1 year ago

Dear authors,

Could please check the log attached, where the problem is? My program is stucked on.

arguments: 1 0x7ffede2239ec 0.010 0.10 0 58 128 3 1234 1 1 3 0x555972b71bb0 0x5559787dcfe0 0x555953d49250 (nil) reassignments threshold: 0 yinyang groups: 0 [0] dest: 0x7f751b200000 - 0x7f751b207400 (29696) [0] device_centroids: 0x7f751b207400 - 0x7f751b207a00 (1536) [0] device_assignments: 0x7f751b207a00 - 0x7f751b207ae8 (232) [0] device_assignments_prev: 0x7f751b207c00 - 0x7f751b207ce8 (232) [0] device_ccounts: 0x7f751b207e00 - 0x7f751b207e0c (12) GPU #0 memory: used 1662779392 bytes (3.3%), free 49387536384 bytes, total 51050315776 bytes GPU #0 has 49152 bytes of shared memory per block transposing the samples... transpose <<<(4, 2), (32, 8)>>> 58, 128 performing kmeans++... kmeans++: dump 58 128 0x555973d8eab0 kmeans++: dev #0: 0x7f751b200000 0x7f751b207400 0x7f751b207a00 step 1[0] dev_dists: 0x7f751b208000 - 0x7f751b208040 (64) step 2[0] dev_dists: 0x7f751b208000 - 0x7f751b208040 (64) done
too few clusters for this yinyang_t => Lloyd plans: [(0, 58)] planc: [(0, 3)] iteration 1: 58 reassignments iteration 2: 5 reassignments iteration 3: 3 reassignments iteration 4: 3 reassignments iteration 5: 3 reassignments iteration 6: 1 reassignments iteration 7: 4 reassignments iteration 8: 8 reassignments iteration 9: 3 reassignments iteration 10: 4 reassignments iteration 11: 2 reassignments iteration 12: 1 reassignments iteration 13: 1 reassignments iteration 14: 0 reassignments return kmcudaSuccess arguments: 1 0x7ffede2239ec 0.010 0.10 0 87 128 3 1234 1 1 3 0x555977d4d8c0 0x5559778afe80 0x555953dbc9d0 (nil) reassignments threshold: 0 yinyang groups: 0 [0]
dest: 0x7f751b200000 - 0x7f751b20ae00 (44544) [0] device_centroids: 0x7f751b20ae00 - 0x7f751b20b400 (1536) [0] device_assignments: 0x7f751b20b400 - 0x7f751b20b55c (348) [0] device_assignments_prev: 0x7f751b20b600 - 0x7f751b20b75c (348) [0] device_ccounts: 0x7f751b20b800 - 0x7f751b20b80c (12) GPU #0 memory: used 1660682240 bytes (3.3%), free 49389633536 bytes, total 51050315776 bytes GPU #0 has 49152 bytes of shared memory per block transposing the samples... transpose <<<(4, 3), (32, 8)>>> 87, 128 performing kmeans++... kmeans++: dump 87 128 0x5559540f44c0 kmeans++: dev #0: 0x7f751b200000 0x7f751b20ae00 0x7f751b20b400 step 1[0] dev_dists: 0x7f751b20ba00 - 0x7f751b20ba40 (64) step 2[0] dev_dists: 0x7f751b20ba00 - 0x7f751b20ba40 (64) done
too few clusters for this yinyang_t => Lloyd plans: [(0, 87)] planc: [(0, 3)] iteration 1: 87 reassignments iteration 2: 9 reassignments iteration 3: 5 reassignments iteration 4: 7 reassignments iteration 5: 3 reassignments iteration 6: 3 reassignments iteration 7: 1 reassignments iteration 8: 3 reassignments iteration 9: 1 reassignments iteration 10: 4 reassignments iteration 11: 3 reassignments iteration 12: 2 reassignments iteration 13: 0 reassignments return kmcudaSuccess arguments: 1 0x7ffede2239ec 0.010 0.10 0 25 128 3 1234 1 1 3 0x555978938ac0 0x555978947750 0x5559540c4d30 (nil) reassignments threshold: 0 yinyang groups: 0 [0] dest: 0x7f751b200000 - 0x7f751b203200 (12800) [0] device_centroids: 0x7f751b203200 - 0x7f751b203800 (1536) [0] device_assignments: 0x7f751b203800 - 0x7f751b203864 (100) [0] device_assignments_prev: 0x7f751b203a00 - 0x7f751b203a64 (100) [0] device_ccounts: 0x7f751b203c00 - 0x7f751b203c0c (12) GPU #0 memory: used 1658585088 bytes (3.2%), free 49391730688 bytes, total 51050315776 bytes GPU #0 has 49152 bytes of shared memory per block transposing the samples... transpose <<<(4, 1), (32, 8)>>> 25, 128 performing kmeans++... kmeans++: dump 25 128 0x555973d97f80 kmeans++: dev #0: 0x7f751b200000 0x7f751b203200 0x7f751b203800 step 1[0] dev_dists: 0x7f751b203e00 - 0x7f751b203e40 (64) step 2[0] dev_dists: 0x7f751b203e00 - 0x7f751b203e40 (64) done
too few clusters for this yinyang_t => Lloyd plans: [(0, 25)] planc: [(0, 3)] iteration 1: 25 reassignments iteration 2: 2 reassignments iteration 3: 3 reassignments iteration 4: 1 reassignments iteration 5: 1 reassignments iteration 6: 2 reassignments iteration 7: 1 reassignments iteration 8: 0 reassignments return kmcudaSuccess arguments: 1 0x7ffede2239ec 0.010 0.10 0 4 128 3 1234 1 1 3 0x5559771795d0 0x555978956050 0x555895ca28c0 (nil) reassignments threshold: 0 yinyang groups: 0 [0]
dest: 0x7f751b200000 - 0x7f751b200800 (2048) [0] device_centroids: 0x7f751b200800 - 0x7f751b200e00 (1536) [0] device_assignments: 0x7f751b200e00 - 0x7f751b200e10 (16) [0] device_assignments_prev: 0x7f751b201000 - 0x7f751b201010 (16) [0] device_ccounts: 0x7f751b201200 - 0x7f751b20120c (12) GPU #0 memory: used 1656487936 bytes (3.2%), free 49393827840 bytes, total 51050315776 bytes GPU #0 has 49152 bytes of shared memory per block transposing the samples... transpose <<<(4, 1), (32, 8)>>> 4, 128 performing kmeans++... kmeans++: dump 4 128 0x555954856b10 kmeans++: dev #0: 0x7f751b200000 0x7f751b200800 0x7f751b200e00 step 1[0] dev_dists: 0x7f751b201400 - 0x7f751b201440 (64) step 2[0] dev_dists: 0x7f751b201400 - 0x7f751b201440 (64) done
too few clusters for this yinyang_t => Lloyd plans: [(0, 4)] planc: [(0, 3)] iteration 1: 4 reassignments iteration 2: 0 reassignments return kmcudaSuccess 4%|████▌ | 2/50 [00:25<11:21, 14.20s/it] | Global Training Round : 3 |

arguments: 1 0x7ffede2239ec 0.010 0.10 0 232 128 3 1234 1 1 3 0x5559736ccd30 0x555977a18c90 0x555954115a90 (nil) reassignments threshold: 2 yinyang groups: 0 [0] *dest: 0x7f751b200000 - 0x7f751b21d000 (118784) [0] device_centroids: 0x7f751b21d000 - 0x7f751b21d600 (1536) [0] device_assignments: 0x7f751b21d600 - 0x7f751b21d9a0 (928) [0] device_assignments_prev: 0x7f751b21da00 - 0x7f751b21dda0 (928) [0] device_ccounts: 0x7f751b21de00 - 0x7f751b21de0c (12) GPU #0 memory: used 1654390784 bytes (3.2%), free 49395924992 bytes, total 51050315776 bytes GPU #0 has 49152 bytes of shared memory per block transposing the samples... transpose <<<(8, 4), (8, 32)>>> 232, 128, xyswap =================> Here stuck on

Thanks in advance for your analysis. If you need more detailed information about this, please let me know.