re-rosales / VarID2

MIT License
1 stars 0 forks source link

error in Quadratic programming to match similar cell types #1

Open ytLiz opened 7 months ago

ytLiz commented 7 months ago

Hi,

I am having some issues when running the VarID2_Figure4.R. I tried to reproduce the pipeline by using the same data. Because some data points were severely scattered, I deleted them. After doing the data filtering I got these result graphs.

the graphs of wild-type samples are the following: 58_plotTrProbs 65_fractDotPlot

the graphs of mutant samples are the following: 104_plotTrProbs 109_fractDotPlot

I tried to choose the cluster 1 of wild-type samples and the cluster 1 of mutant samples as LT-HSCs. But when I computed the similarity weights of clusters and made the heat map, I found that there is almost no similarity between the clusters I have chosen.

the heat map is the following: 175_pheatmap

and this is the code I used:

QP <- function(k,m,norm=TRUE){
  Dmat <- t(m) %*% m
  dvec <- t(k) %*% m
  if ( norm ){
    Amat <- cbind(rep(1,ncol(m)), diag(ncol(m)))
    bvec <- c(1,rep(0,ncol(m)))
    qp <- solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat, bvec = bvec, meq = 1)}
  else{
    Amat <- diag(ncol(m))
    bvec <- rep(0,ncol(m))
    qp <- solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat, bvec = bvec, meq = 0, factorized=FALSE)}
  return( list(w=qp$solution, fit=m %*% qp$solution, residual= sum((m %*% qp$solution - k)**2), qp=qp))
}

k <- intersect( sw@genes, sk@genes )

for ( i in 1:max(sk@cpart) ){
  if (i==1) mk <- rowMeans(as.matrix(sk@ndata[k,sk@cpart == i])) else mk <- cbind(mk, rowMeans(as.matrix(sk@ndata[k,sk@cpart == i]))) 
}
colnames(mk) <- paste("K",1:ncol(mk),sep=".")

for ( i in 1:max(sw@cpart) ){
  if (i==1) mw <- rowMeans(as.matrix(sw@ndata[k,sw@cpart == i])) else mw <- cbind(mw, rowMeans(as.matrix(sw@ndata[k,sw@cpart == i]))) 
}
colnames(mw) <- paste("W",1:ncol(mw),sep=".")

f <- apply(mk,1,var) > 0 & apply(mw,1,var) > 0 
m <- cor(as.matrix(mw[f,]),as.matrix(mk[f,]),use="pairwise.complete.obs")
m <- cor(as.matrix(mw[f,]),as.matrix(mk[f,]),use="pairwise.complete.obs",method="spearman")

w <- matrix(rep(NA,ncol(mw)*ncol(mk)),ncol=ncol(mw) )
for ( i in 1:ncol(mk) ){
  w[i,] <- QP(mk[f,i],mw[f,])$qp$solution   
}

colnames(w) <- colnames(mw)
rownames(w) <- colnames(mk)
#w contains the similarity weights of clusters in the W41/W41 dataset (in rows) to WT clusters (in columns) inferred by quadratic programming
#Plot for better visualization
pheatmap(w,cluster_cols=FALSE,cluster_rows=FALSE)

I wonder that if I should modify the code and how to do so, or if I may have other errors. I would appreciate if you can provide me some help.

Best, Liz

re-rosales commented 6 months ago

Hi,

I don't see any problem with the code you provided for running the quadratic programming function. The for loop:

for ( i in 1:ncol(mk) ){
  w[i,] <- QP(mk[f,i],mw[f,])$qp$solution   
}

is comparing each cluster in the mutant data against all clusters in the wild type, so there is no preselection here.

I do see subtile differences in some marker genes in your dotplots compared to the original publication, so I suspect that the clustering can have an impact on the quadratic programming function. I noticed a mistake on the clustering steps for the mutant dataset on my scripts. In line 100, the leiden resolution should be 2 instead of 1.5. I already updated this. Other reasons for changing the clustering may be to have removed few cells in your analyses and /or using different package versions. If useful, I used the following package versions:

RaceID: 0.2.6
quadprog: 1.5-8
leiden: 0.3.9

I also made available the SCseq objects I used on this link So you can use them to compare. Here is a code to upload them into the R environment:

#Wild type data:
sw <- readRDS("DahlinSCseq_WT.rds")
listw <- readRDS("DahlinSCseq_list_WT.rds")
resw <- listw$res
clw <- listw$cl
probsw <- listw$probs

#Kit mutant data
sk <- readRDS("DahlinSCseq_Mut.rds")
listk <- readRDS("DahlinSCseq_list_Mut.rds")
resk <- listk$res
clk <- listk$cl
probsk <- listk$probs

Let me know in case of further questions or problems