Open wbaopaul opened 5 years ago
Hi, there is no readily to be used R function to do clustering but I have excerpted the code in GUI to do the clustering. See below. Ideally you need to first run dimension reduction (PCA,tsne,umap...) and then run the clustering. There are many different clustering methods. 'reducedata' will be your input and 'res' is the output. Each row is a cell and each column is a reduced dimension.
reducedata <- matrix(rnorm(500),nrow=100)
x <- 2:min(20,nrow(reducedata))
y <- sapply(x, function(k) {
set.seed(12345)
clu <- kmeans(reducedata,k,iter.max = 100)$cluster
cluSS <- sum(sapply(unique(clu),function(i) {
sum(rowSums(sweep(reducedata[clu==i,,drop=F],2,colMeans(reducedata[clu==i,,drop=F]),"-")^2))
}))
1-cluSS/sum((sweep(reducedata,2,colMeans(reducedata),"-"))^2)
})
y <- c(0,y)
x <- c(1,x)
clunum <- which.min(sapply(x, function(i) {
x2 <- pmax(0,x-i)
sum(lm(y~x+x2)$residuals^2)
}))
set.seed(12345) res <- kmeans(reducedata,clunum,iter.max = 100)$cluster
datahclust <- hclust(dist(reducedata))
x <- 2:min(20,nrow(reducedata))
y <- sapply(x, function(k) {
clu <- cutree(datahclust,k)
cluSS <- sum(sapply(unique(clu),function(i) {
sum(rowSums(sweep(reducedata[clu==i,,drop=F],2,colMeans(reducedata[clu==i,,drop=F]),"-")^2))
}))
1-cluSS/sum((sweep(reducedata,2,colMeans(reducedata),"-"))^2)
})
y <- c(0,y)
x <- c(1,x)
clunum <- which.min(sapply(x, function(i) {
x2 <- pmax(0,x-i)
sum(lm(y~x+x2)$residuals^2)
}))
res <- cutree(datahclust,clunum)
library(mclust) x <- 2:min(20,nrow(reducedata)) clunum <- Mclust(reducedata,G=x,modelNames="VVV",priorControl(functionName="defaultPrior", shrinkage=0.1))$G res <- Mclust(reducedata,G=clunum,modelNames="VVV",priorControl(functionName="defaultPrior", shrinkage=0.1)) res <- apply(res$z,1,which.max)
library(dbscan) minpts <- 20 alldist <- sort(kNNdist(reducedata,minpts)) x <- 1:length(alldist) eps <- alldist[which.min(sapply(x, function(i) { x2 <- pmax(0,x-i) sum(lm(alldist~x+x2)$residuals^2) }))] res <- dbscan(reducedata,eps=eps,minPts=minpts)$cluster
Got you. If I have a cellpeak matrix and I did PCA/TSNE/UMAP on it, then I use one of those clustering methods, is this the typical way for SCRAT doing clustering? or Do I need to do something else on the cellpeak matrix before I did PCA/TSNE/UMAP? Thanks.
Yes. SCRAT by default also performs standardization for each cell across all peaks/features to have mean 0 and sd 1. The standardized data is sent to PCA/tsne/umap (Note that for PCA, standardization for each feature across all cells is still needed, so two steps of standardization) and finally do clustering.
It's very clear now, thanks.
I wanted to try the clustering method used in your method. Is there any function in the package supporting clustering the cell*peak matrix for scATAC? (not the GUI version.) Thanks