r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
296 stars 124 forks source link

Quick question on Step12 to step13 #169

Open JihedC opened 4 years ago

JihedC commented 4 years ago

Hi! Thanks a lot for this pipeline, it's well documented and so far I have been using it without any issue. I have a small question on step 12. At this moment you create a table of the peaks combined and then out of R you do :

snaptools snap-add-pmat \
    --snap-file atac_v1_adult_brain_fresh_5k.snap \
    --peak-file peaks.combined.bed

I understand here that you are adding a cell-by-peak matrix to the snap object.

At Step 13 you return in R and load the previously saved RDS object.

I was wondering why you are doing this? Is the snaptools snap-add-pmat function affecting the RDS file? If yes I don't understand how.

I tried to continue the pipeline without waiting for the step 12 to be finished but it doesn't work, so it seems important to go through it but I don't understand what it's doing.

Could you explain me this please? I am trying to prepare a script to run the pipeline automatically.

Thanks in advance!

JihedC commented 4 years ago

Few days later, it's still not working :(

Here is what I get at step 14, when I want to call the DAR:

> x.sp = readRDS("ATAC12_step12.snap.rds")
> x.sp = addPmatToSnap(x.sp)
Epoch: reading cell-peak count matrix session ...
> x.sp = makeBinary(x.sp, mat="pmat")
> x.sp
number of barcodes: 3667
number of bins: 546206
number of genes: 9
number of peaks: 137257
number of motifs: 0
> DARs = findDAR(
+     obj=x.sp,
+     input.mat="pmat",
+     cluster.pos=1,
+     cluster.neg.method="knn",
+     test.method="exactTest",
+     bcv=0.1,
+     seed.use=10
+   );
Epoch: checking inputs ...
Epoch: identifying DARs for positive cluster ...
> DARs$FDR = p.adjust(DARs$PValue, method="BH")
> idy = which(DARs$FDR < 5e-2 & DARs$logFC > 0)
> pdf("ATAC2_Plot_cluster1_DAR.pdf")
> plot(DARs$logCPM, DARs$logFC,
+     pch=19, cex=0.1, col="grey",
+     ylab="logFC", xlab="logCPM",
+     main="Cluster 1"
+   )
> points(DARs$logCPM[idy],
+     DARs$logFC[idy],
+     pch=19,
+     cex=0.5,
+     col="red"
+   )
> abline(h = 0, lwd=1, lty=2)
> covs = Matrix::rowSums(x.sp@pmat)
> vals = Matrix::rowSums(x.sp@pmat[,idy]) / covs
> vals.zscore = (vals - mean(vals)) / sd(vals)
> plotFeatureSingle(
+     obj=x.sp,
+     feature.value=vals.zscore,
+     method="tsne",
+     main="Cluster 26",
+     point.size=0.1,
+     point.shape=19,
+     down.sample=5000,
+     quantiles=c(0.01, 0.99)
+   )
Error in quantile.default(feature.value, quantiles[1]) :
  missing values and NaN's not allowed if 'na.rm' is FALSE

Do you know what the issue is ?