sistm / NPflow

Dirichlet process mixture of skew t-distributions for model based clustering of flow cytometry data
GNU Lesser General Public License v3.0
4 stars 7 forks source link

Selection of the hyperG0 parameter #22

Closed cartal closed 6 years ago

cartal commented 6 years ago

Hi,

I was wondering if you could expand on the selection of the hyperG0 parameter. I am not sure if I've overlooked at it while reading the preprint (possibly, since my math background is limited).

I have a FACS panel with 14 dimensions. I would like to know how do you recommend to run NPflow in these data.

Thanks in advance!

borishejblum commented 6 years ago

Hi Carlos,

Thank you for your interest in our work.

The prior parameter hyperG0 specify the various hyperparameters of the prior. As a rule of sum, you could consider the following specification for hyperG0 where zis your data matrix):

d <- nrow(z) #the dimension of the data
hyperG0 <- list()
hyperG0[["b_xi"]] <- rowMeans(z)
hyperG0[["b_psi"]] <- rep(0, d)
hyperG0[["kappa"]] <- 0.001
hyperG0[["D_xi"]] <- 100
hyperG0[["D_psi"]] <- 100
hyperG0[["nu"]] <- d + 1
hyperG0[["lambda"]] <- diag(apply(z, MARGIN = 1, FUN = var))/3

I have updated the documentation that was a little scarce concerning this hyperG0 parameter to provide more information.

For your case, I would recommend the skew-t model, that can be estimated through the MCMC algorithm in the DPMGibbsSkewT() function (maybe in parallel for a reduce computation runtime).

Boris