jokofa / torch_kmeans

PyTorch implementations of KMeans, Soft-KMeans and Constrained-KMeans which can be run on GPU and work on (mini-)batches of data.
MIT License
54 stars 6 forks source link

I don't understand how this weight is set, MAX_POINTS is set according to what, #1

Closed hutingz closed 1 year ago

jokofa commented 1 year ago

Hi, thanks for opening the issue. If I understand your question correctly, then you mean the MAX_POINTS parameter used in the constrained_kmeans jupyter notebook? There that value is just used to normalize the weights to simulate the case where you do not have real valued weights but just a limit on the NUMBER of points per cluster. That means it is a user provided parameter, i.e. how many points you want to allow per cluster. Then, if you take weights of one and divide by the max number of points per cluster, the constraint of sum(weights) < = 1 will enforce that at maximum MAX_POINTS are assigned to the same cluster.

Does that answer your question?

hutingz commented 1 year ago

  Thank you very much for your reply.   Now,I have another question about gpu utilization, when I don't use ConstrainedKMeans() in the program, the GPU utilization is quite high, once I use ConstrainedKMeans(), the GPU utilization hovers around 20%.Step through debugging and find that the data is on cuda, so I don't understand why this function makes GPU utilization so low.

------------------ 原始邮件 ------------------ 发件人: "jokofa/torch_kmeans" @.>; 发送时间: 2022年11月5日(星期六) 晚上8:40 @.>; @.**@.>; 主题: Re: [jokofa/torch_kmeans] I don't understand how this weight is set, MAX_POINTS is set according to what, (Issue #1)

Hi, thanks for opening the issue. If I understand your question correctly, then you men the MAX_POINTS parameter used in the constrained_kmeans jupyter notebook? There that value is just used to normalize the weights to simulate the case where you do not have real valued weights but just a limit on the NUMBER of points per cluster. That means it is a user provided parameter, i.e. how many points you want to allow per cluster. Then, if you take weights of one and divide by the max number of points per cluster, the constraint of sum(weights) < = 1 will enforce that at maximum MAX_POINTS are assigned to the same cluster.

Does that answer your question?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

jokofa commented 1 year ago

I did not completely log and check the GPU utilization, but this could be because of the constrained assignment in the constrained kmeans procedure, which has to be executed sequentially and therefore is limited in its GPU utilization (this for loop ), while the assignment for standard kmeans can be done in parallel on the GPU.