Open dioncst opened 4 years ago
Hello,
You can select cells based on "coverage" and "promoter ratio" - percentage of reads mapped to promoters per cell
> library(GenomicRanges);
> system("wget http://renlab.sdsc.edu/r3fang/share/Fang_2019/MOs_snATAC/genes/promoter.bed");
> promoter.df = read.table("promoter.bed");
> promoter.gr = GRanges(promoter.df[,1], IRanges(promoter.df[,2], promoter.df[,3]));
> ov = findOverlaps(x.sp@feature, promoter.gr);
> idy = queryHits(ov);
> cov = log10(SnapATAC::rowSums(x.sp, mat="bmat")+1)
> promoter_ratio = SnapATAC::rowSums(x.sp[,idy, mat="bmat"], mat="bmat") / SnapATAC::rowSums(x.sp, mat="bmat");
> plot(cov, promoter_ratio, cex=0.5, col="grey", xlab="log(count)", ylab="FIP Ratio", ylim=c(0,1 ));
# chose the cutoff based on the plot
> idx = which(promoter_ratio > 0.2 & promoter_ratio < 0.8 & cov > 3);
> x.sp = x.sp[idx,];
"promoter.bed" is a list of promoters in mouse genome (mm10), you might need to find another list for mm9
"promoter.bed" is a list of promoters in mouse genome (mm10), you might need to find another list for mm9
Sorry, I'm not abale to download this "promoter.bed" file through the link. How is it generated?
"promoter.bed" is a list of promoters in mouse genome (mm10), you might need to find another list for mm9
Could you please send me this promoter.bed file if possible,my email is dioncst@sjtu.edu.cn. I thik I can use UCSC liftover to convert it to mm9. Or I can use CrossMap to convert my .bam to mm10.
Hi,
I have create a folder that contains reference files required for the analysis: http://renlab.sdsc.edu/r3fang/share/github/reference/
Inside the mm9 folder, you will find the gencode.vM1.annotation.gtf file which is GENCODE annotation for mm9 genome. Please download this file.
To select the useful barcode, I am using the 10X Mouse Brain dataset as an example here. The snap file used in this analysis can be found here
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rtracklayer")
library(rtracklayer)
library(GenomicRanges)
library(SnapATAC)
gtf.gr <- rtracklayer::import("gencode.vM19.annotation.gtf")
gene.gr <- gtf.gr[gtf.gr$type == "gene"]
# extract promoter region for each gene
promoter.gr <- reduce(promoters(gene.gr, upstream=2000, downstream=0))
# create the snap object
x.sp = createSnap("Mouse_Brain_10X/atac_v1_adult_brain_fresh_5k.snap", sample="MB")
# load the cell-by-bin matrix
x.sp = addBmatToSnap(x.sp)
# load the cell-by-bin matrix
ov = findOverlaps(x.sp@feature, promoter.gr);
# find promoter overlapping bins
idy = queryHits(ov);
log_cov = log10(SnapATAC::rowSums(x.sp, mat="bmat")+1)
promoter_ratio = Matrix::rowSums(x.sp@bmat[,idy]) / Matrix::rowSums(x.sp@bmat);
plot(log_cov, promoter_ratio, cex=0.5, col="grey", xlab="log(count)", ylab="FIP Ratio", ylim=c(0,1 ));
# chose the cutoff based on the plot
idx = which(promoter_ratio > 0.2 & promoter_ratio < 0.8 & log_cov > 3);
x.sp = x.sp[idx,]
x.sp
number of barcodes: 4108
number of bins: 546206
number of genes: 0
number of peaks: 0
number of motifs: 0
gene.gr <- gtf.gr[gtf$type == "gene"]
Thank a lot~ But seems this line do not work with the GTF file I downloaded at ENCODE. Error in gtf.gr[gtf$type == "gene"] : object 'gtf' not found
So, I tried this .
gene.gr <- gtf.gr[gtf.gr@elementMetadata@listData[["type"]] == "gene"]
sorry, it is a typo. should be gene.gr <- gtf.gr[gtf.gr$type == "gene"]
Hi, thanks for a great tool. I am in the learning process, so it might be naive question.
For barcode selection I did what you did in the answer I quoted here. I'm now wondering if the log(count) plotted on the x-axis in the end only contains unique fragments or also duplicate fragments? I have read in another issue that duplicates are removed (by default?) in the step where you use the command "snaptools snap-pre" , so I guess duplicates should not be in the data set anymore at this barcode selection step?
Best, Trine
Hi, I have create a folder that contains reference files required for the analysis: http://renlab.sdsc.edu/r3fang/share/github/reference/ Inside the mm9 folder, you will find the gencode.vM1.annotation.gtf file which is GENCODE annotation for mm9 genome. Please download this file. To select the useful barcode, I am using the 10X Mouse Brain dataset as an example here. The snap file used in this analysis can be found here if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("rtracklayer")
library(rtracklayer) library(GenomicRanges) library(SnapATAC)
gtf.gr <- rtracklayer::import("gencode.vM19.annotation.gtf") gene.gr <- gtf.gr[gtf.gr$type == "gene"]
extract promoter region for each gene
promoter.gr <- reduce(promoters(gene.gr, upstream=2000, downstream=0))
create the snap object
x.sp = createSnap("Mouse_Brain_10X/atac_v1_adult_brain_fresh_5k.snap", sample="MB")
load the cell-by-bin matrix
x.sp = addBmatToSnap(x.sp)
load the cell-by-bin matrix
ov = findOverlaps(x.sp@feature, promoter.gr);
find promoter overlapping bins
idy = queryHits(ov); log_cov = log10(SnapATAC::rowSums(x.sp, mat="bmat")+1) promoter_ratio = Matrix::rowSums(x.sp@bmat[,idy]) / Matrix::rowSums(x.sp@bmat); plot(log_cov, promoter_ratio, cex=0.5, col="grey", xlab="log(count)", ylab="FIP Ratio", ylim=c(0,1 ));
chose the cutoff based on the plot
idx = which(promoter_ratio > 0.2 & promoter_ratio < 0.8 & log_cov > 3); x.sp = x.sp[idx,] x.sp number of barcodes: 4108 number of bins: 546206 number of genes: 0 number of peaks: 0 number of motifs: 0
idx = which(promoter_ratio > 0.2 & promoter_ratio < 0.8 & log_cov > 3); x.sp = x.sp[idx,] x.sp
This seems to mess up the corresponding downstream metadata. e.g. I get errors on the scaleCountMatrix that the metaData numbers don't correspond.
If this is the standard qc
barcodes.sel = barcodes[which(UMI >= 3 & UMI <= 5 & promoter_ratio >= 0.15 & promoter_ratio <= 0.6),]; rownames(barcodes.sel) = barcodes.sel$barcode; x.sp = x.sp[which(x.sp@barcode %in% barcodes.sel$barcode),]; x.sp@metaData = barcodes.sel[x.sp@barcode,]; x.sp
Should there be a line akin to this?
x.sp@metaData = barcodes.sel[x.sp@barcode,];
I did the barcode selection the way you provided, but here's what I got. Do you have any suggestions? Thanks
Hi~ I got my data(.bam file) at http://atlas.gs.washington.edu/mouse-atac/data/. And I generated snap file using the bam file according to your instruction. But I don't have a CSV file that conatin the information as your example (atac_v1_adult_brain_fresh_5k_singlecell.csv, http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X/atac_v1_adult_brain_fresh_5k_singlecell.csv) I'm wondering how can I perform barcode selection under this circumstance?
Best, Shitao