yinchangchang / TAME

KDD2020 paper; Identifying Sepsis Subphenotypes via Time-Aware Multi-Modal Auto-Encoder
18 stars 6 forks source link

How to calculate P-value, Calinski-Harabasz Index (CHI) and Davis-Bouldin Index (DBI)? #3

Open Arsener opened 3 years ago

Arsener commented 3 years ago

Hello, Dr. Yin. I'm very interested in your work using weighted kmeans for clustering. I want to know how to calculate p-value. Would you please provide me with relevant reference materials about it? In addition, the center of cluster is needed when calculating CHI and CBI. In this work, multi-dimensional time series data with different length are clustered, so how to get the center of the cluster?

Looking forward to your reply. Thank you!

yinchangchang commented 3 years ago

We computed the p-value for each variable first and the mean p-value for all variables were presented in the paper. (see https://www.nature.com/articles/s41598-018-37545-z)

We used sklearn to calculate the CHI and DBI (https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation)