jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

Output bam files become smaller after shiftGAlignmentsList #51

Open Rui-Jing opened 2 years ago

Rui-Jing commented 2 years ago

Hi, Thanks for developing this user-friendly tool for ATAC-seq. I found some problems after running shiftGAlignmentsList, and the output BAM file is about half the size of the input file. I don't know if this is a normal situation or if ATACseqQC does some filtering.

My code :

library(ATACseqQC)
bamfile <- '/home/zfzf/atac_ljh/alignment/neg2_701504_sorted.bam'
bamfile.labels <- gsub(".bam", "", basename(bamfile))
#bamQC(bamfile, outPath=NULL)
estimateLibComplexity(readsDupFreq(bamfile))
## generate fragement size distribution
fragSize <- fragSizeDist(bamfile, bamfile.labels)

## bamfile tags to be read in
possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", 
                                "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",
                                "TC", "UQ"), 
                    "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",
                                  "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",
                                  "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",
                                  "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",
                                  "U2"))

library(Rsamtools)

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
                     param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags

## files will be output into outPath
outPath <- "align_adjusted"
dir.create(outPath)

## shift the coordinates of 5'ends of alignments in the bam file
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
jianhong commented 2 years ago

Hi Rui-Jing,

Thank you for trying ATACseqQC to analyze your ATAC-seq data. Could you let me know the size of file by running the following code?

samtools view -b -f 2 -F 2816 neg2_701504_sorted.bam > clean.reads.bam

Thank you.

Jianhong.

From: Rui-Jing @.> Date: Wednesday, June 8, 2022 at 10:42 PM To: jianhong/ATACseqQC @.> Cc: Subscribed @.***> Subject: [jianhong/ATACseqQC] Output bam files become smaller after shiftGAlignmentsList (Issue #51)

Hi, Thanks for developing this user-friendly tool for ATAC-seq. I found some problems after running shiftGAlignmentsList, and the output BAM file is about half the size of the input file. I don't know if this is a normal situation or if ATACseqQC does some filtering.

My code :

library(ATACseqQC)

bamfile <- '/home/zfzf/atac_ljh/alignment/neg2_701504_sorted.bam'

bamfile.labels <- gsub(".bam", "", basename(bamfile))

bamQC(bamfile, outPath=NULL)

estimateLibComplexity(readsDupFreq(bamfile))

generate fragement size distribution

fragSize <- fragSizeDist(bamfile, bamfile.labels)

bamfile tags to be read in

possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2",

                            "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",

                            "TC", "UQ"),

                "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",

                              "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",

                              "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",

                              "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",

                              "U2"))

library(Rsamtools)

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),

                 param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag

tags <- names(bamTop100)[lengths(bamTop100)>0]

tags

files will be output into outPath

outPath <- "align_adjusted"

dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE)

shiftedBamfile <- file.path(outPath, "shifted.bam")

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/ATACseqQC/issues/51, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEAZ3LMXRPFKLU7VKBDDVOFKYTANCNFSM5YIPMRXA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Rui-Jing commented 2 years ago

Thanks for your quikly reply. The input bam size is 4810847KB, and the output bam size is 2120454KB. So, I wonder if ATACseqQC has done some filtering?

jianhong commented 2 years ago

You may want to ask help by ?readBamFile There is flag filters when you read bam file by ATACseqQC. You can change the flags to meet your design.

Jianhong.

From: Rui-Jing @.> Date: Thursday, June 9, 2022 at 9:26 AM To: jianhong/ATACseqQC @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/ATACseqQC] Output bam files become smaller after shiftGAlignmentsList (Issue #51)

Thanks for your quikly reply. The input bam size is 4810847KB, and the output bam size is 2120454KB. So, I wonder if ATACseqQC has done some filtering?

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/ATACseqQC/issues/51#issuecomment-1151118336, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA2CLQ2EUJVEPYNOZT3VOHWJFANCNFSM5YIPMRXA. You are receiving this because you commented.Message ID: @.***>

Rui-Jing commented 2 years ago

Great, thanks for the feedback !