sa-lee / starmie

starmie: plotting and inference for population structure models :star2:
Other
12 stars 6 forks source link

fixed k-means for looking at best permutation #15

Closed sa-lee closed 8 years ago

sa-lee commented 8 years ago

better than CLUMPP possibly which has problem of multimodals

gtonkinhill commented 8 years ago

robbie has an idea for using a regression based approach. I think we should generate some test Q matrices to compare the approaches.

gtonkinhill commented 8 years ago
library(glmnet)

set.seed(10)
k <- 20
runs <- 20
x <- matrix(abs(rnorm(200*k,20,10)),200,k)

for(i in seq(0,190,k)){
  diag(x[ (i+1):(i+k),] ) <- 150
}
x <- x/apply(x,1,sum)

noise1 <-0.01
x[,5] <- x[,1]+abs(rnorm(200,0, noise1))
x <- x/apply(x,1,sum)

colnames(x) <- paste("V",1:k,sep="")
mat <- x
set.seed(10)
noise <- 0.01
for(i in 2:runs){
  a <- x[,order(sample(1:k,k))]+matrix(rnorm(200*k,0,noise),200,k)
  a <- apply(a, 1:2, max ,0)
  a <- a/apply(a,1,sum)
  colnames(a) <- paste(colnames(a),i,sep="_")
  mat <- cbind(mat,a)

}

try <- cv.glmnet(mat[,11:400],mat[,1],alpha=0.9)

try1 <- data.frame(var=rownames(coef(try)),beta=as.numeric(coef(try)))
try1[try1$beta>0,]
gtonkinhill commented 8 years ago

Start from the variable that gets regressed best, and work progressively down to the hardest to define clusters. If multiple clusters from the same run are assigned choose the one with the highest parameter and weight all others to zero ... repeat iteratively

sa-lee commented 8 years ago

not sure if i really get this. what do you mean by regressed best?

sa-lee commented 8 years ago

@gtonkinhill closing this for now since we've decided to go with correlation matrix method