Closed Austin-s-h closed 5 years ago
Hi there,
barcode x.sp@barcode
peak position x.sp@peak
SnapATAC does not provide readsInPeak
, can you tell me a bit more about this column?
I am happy to get SnapATAC compatible with other toolset if it useful, for instance, chromVAR has been included. Will take sometime to test Cicero.
Best, -R
Following https://cole-trapnell-lab.github.io/cicero-release/docs/#loading-your-data I need a way to retain the information so that there are multiple instances of the peak (from the consensus x.sp@peak) labeled with which cell and the reads (from BAMs or the snap object?) that that individual cell had in that peak.
So... chr1_peak1 | cell1 | 1 chr1_peak1 | cell 2 | 0 chr1_peak2 | cell1 | 6 ...etc
So while x.sp@barcode provides the list of barcodes in my object, it is formatted in such a way that it is associated with all of the peaks from that cell. Does this make sense?
Yes, you are right. It is the peak i and cell j and number of reads in cell j that overlap with peak i.
It will take some work to re-formulate a snap object as the input for Cicero. Currently it is not what I am planning to do in the short term. Sorry.
Well, I believe I have found a solution. Feel free to implement this within your package (I don't know how github works enough to do it myself).
#Turn peak Grange into a dataframe
df <- data.frame(seqnames=seqnames(peaks.gr),
starts=start(peaks.gr)-1,
ends=end(peaks.gr),
names=c(rep(".", length(peaks.gr))),
scores=c(rep(".", length(peaks.gr))),
strands=strand(peaks.gr))
#Turn sparse matrix into matrix, then transpose and turn to dataframe
peak.matrix <- as(object = x.sp@pmat, Class = 'matrix')
peak.dataframe <- as.data.frame(t(peak.matrix))
#assign column names corresponding to the cell barcodes.
colnames(peak.dataframe) <- x.sp@barcode
#peek at it.
peak.dataframe[0:5,0:5]
#Assign rownames (they'll be lost), and a new column as a factor of the peaks in Cicero format.
rownames(peak.dataframe) <- paste0(df$seqnames,"_",df$starts,"_",df$ends)
peak.dataframe$peak <- factor(rownames(peak.dataframe))
#check to make sure that it is a factor.
peak.dataframe[0:5,ncol(peak.dataframe)]
#Now, convert this to long format suitable for cicero input
library(tidyr)
peak.long <- gather(peak.dataframe, key = cell, reads, colnames(peak.dataframe)[1]:colnames(peak.dataframe)[ncol(peak.dataframe)-1], factor_key = TRUE)
head(peak.long)
tail(peak.long)
I have to run some sanity checks for my assumptions that everything is kept in the correct order, but I think that this should work :)
Many thanks for sharing! I will check it out and integrate into the pipeline when I get some time. Again. thanks for sharing
Hi,
I am interested in this too. @r3fang, did you already included in SnapATAC?
Thanks!
Hello, I'm sorry to bother you again. I'm having some trouble exporting the GRanges object of my peak matrix (pmat) from my snap object to a format usable in Cicero.
From their documentation, they want a 3-column file in the following format.
chr1_start_end | cellBarcode | readsInPeak
I know that this data is available in the object, but I'm not sure how to compile it all correctly while maintaining the cell barcodes. If you could help me in retrieving this information that would be very helpful. Additionally, I think this would be good to include in some of your examples as this may help in integrating other tools with yours.
Thank you!