Closed crixus5678 closed 1 year ago
For reference, from the paper:
Generally speaking, γ_l is related to σ_l , the average standard deviation of numeric attributes in cluster l. In practice, σ_l can be used as a guidance to determine γ_l . However, since σ_l is unknown before clustering, the overall average standard deviation σ of numeric attributes can be used for all σ_l.
So yes, it appears you are correct in your statement.
For the estimation of gamma in k-prototypes, the current implementation appears to estimate the gamma by using 0.5 * (standard deviation of all numeric data).
However, in the paper [Huang 1997] it was mentioned that gamma is guided by the " average standard deviation of numeric attributes". If that is the case, shouldn't we be calculating the mean for all the standard deviation of each numeric attribute?