Open YihengJiang0912 opened 4 months ago
I'll provide a bit of my original code to give you some ideas. The function was used for removing the batch effect. It is capable of pca downscaling and mapping while removing batch effects.
batchPCA =function(indata, batch, fig.dir, PCA.fig.title, pos="bottomright", xy=c(1,2), cols=NULL, showID=FALSE, cex=1, showLegend=T) {
library(ClassDiscovery)
outfile = file.path(fig.dir, paste(PCA.fig.title, ".pdf",sep=""))
N.batch = length(unique(batch))
if (is.null(cols)) {
cols <- rainbow(N.batch)
}else{
if (length(cols) != N.batch) {stop("cols length not equal to batch length")}
}
indata=na.omit(indata)
pca<-SamplePCA(indata, usecor=F, center=T)
pct1 <- round (pca@variances[xy[1]]/sum(pca@variances), digits=3)*100
pct2 <- round (pca@variances[xy[2]]/sum(pca@variances), digits=3)*100
xlab.text = paste("Comp ", xy[1], ": ", as.character(pct1), "% variance", sep="")
ylab.text = paste("Comp ", xy[2], ": ", as.character(pct2), "% variance", sep="")
pdf(file=outfile)
plot(pca@scores[,xy[1]], pca@scores[,xy[2]], cex=0.7, xlab=xlab.text, ylab=ylab.text, col=cols[factor(batch)], pch=(1:N.batch)[factor(batch)],lwd=1.5, main=PCA.fig.title)
abline(h=0, v=0, col="brown", lty=2)
abline(h=0, v=0, col="brown", lty=2)
center1<-tapply(pca@scores[,xy[1]], factor(batch), mean)
center2<-tapply(pca@scores[,xy[2]], factor(batch), mean)
for (ii in 1:length(center1)) {
groupi<-pca@scores[as.numeric(factor(batch))==ii, xy]
# print(paste("Cluster", ii))
if (class(groupi)=="matrix") {
for (j in (1:nrow(groupi))) {
segments( groupi[j,1], groupi[j,2], center1[ii], center2[ii], col=cols[ii] , lwd=0.3)
}
}else {
segments( groupi[1], groupi[2], center1[ii], center2[ii], col=cols[ii] , lwd=0.3)
}
}
points(center1, center2, pch=7, lwd=1.5,col=cols)
if (showID) {
text(pca@scores[,xy[1]], pca@scores[,xy[2]], colnames(indata), lwd=1, cex=cex)
}
if(showLegend){
legend(pos,legend=names(table(factor(batch))), text.col=cols, pch=(1:N.batch), col=cols, lty=1)
}
invisible(dev.off())
}
for example,
batchPCA(indata = t(scale(t(combined.expr))),
batch =tumAnno$cancer type
,
fig.dir = ".",
PCA.fig.title = "Raw PCA for combined expression profile pancancer",
cols = c(rep(mycol,3)),
showID = F,
cex = 0.7,
showLegend = T)
Dear authors, I am now reading your paper "Pan-cancer evaluation of regulated cell death to predict overall survival and immune checkpoint inhibitor response", it is an amazing and impressive work and I benefited a lot from it!
However, I am confused about the methods you used to remove batch effect among the TCGA pacn-cancer samples in Figure4A, you mentioned that you used R package “sva” to do that in meta-ICI cohort. Did you used the same method in TCGA samples? I will be very grateful if you could give me a little guidance in that.