jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

splitGAlignmentsByCut 'mergeBam' error #24

Closed GBeattie closed 4 years ago

GBeattie commented 4 years ago

Hey!

Enjoying the package so far, just having difficulties with this error, details below.

The error arises when I try to split using any more than one chromosome as input to splitGAlignmentsByCut;

files will be output into outPath

outPath <- "splited"
dir.create(outPath)
## shift the coordinates of 5'ends of alignments in the bam file
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
which <- as(seqinfo(Mmusculus), "GRanges")[1:2]
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE))
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome <- Mmusculus
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)

Error:

Error in value[[3L]](cond) : 
  'mergeBam' 'destination' exists, 'overwrite' is FALSE
  destination: splited/NucleosomeFree.bam

There are still split BAMs in the output directory, however they only contain the 2nd chromosome in this case.

If I include more chromosomes (1-X,Y), it seems to output only the last chromosome in the list (along with same error).

Just the first chromosome (i.e. which <- as(seqinfo(Mmusculus), "GRanges")[1] results in no error, but hoping for all chromosomes in output.

I could try manually setting the mergeBam to overwrite = TRUE, but not sure this is the correct approach.

jianhong commented 4 years ago

Did you tried to change the outPath to different folder?

Jianhong.

On Tue, Apr 7, 2020 at 11:36 AM Gordon Beattie notifications@github.com wrote:

Hey!

Enjoying the package so far, just having difficulties with this error, details below.

The error arises when I try to split using any more than one chromosome as input to splitGAlignmentsByCut; files will be output into outPath

outPath <- "splited" dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(BSgenome.Mmusculus.UCSC.mm10) library(TxDb.Mmusculus.UCSC.mm10.knownGene) which <- as(seqinfo(Mmusculus), "GRanges")[1:2] gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE)) shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene) genome <- Mmusculus objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)

Error:

Error in value[3L] : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splited/NucleosomeFree.bam

There are still split BAMs in the output directory, however they only contain the 2nd chromosome in this case.

If I include more chromosomes (1-X,Y), it seems to output only the last chromosome in the list (along with same error).

Just the first chromosome (i.e. which <- as(seqinfo(Mmusculus), "GRanges")[1] results in no error, but hoping for all chromosomes in output.

I could try manually setting the mergeBam to overwrite = TRUE, but not sure this is the correct approach.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA5MMHKHT5XLOSEUHJTRLNCBHANCNFSM4MDG2RLA .

-- Yours sincerely, Jianhong Ou

GBeattie commented 4 years ago

Thanks for the response, I tried that, same result, maybe I'm missing something? The splitGAlignmentsByCut still produces outputs in the new directory, but again only the one chromosome.

> list.dirs()
[1] "."
> outPath <- "splited"
> dir.create(outPath)
> ## shift the coordinates of 5'ends of alignments in the bam file
> library(BSgenome.Mmusculus.UCSC.mm10)
> library(TxDb.Mmusculus.UCSC.mm10.knownGene)
> which <- as(seqinfo(Mmusculus), "GRanges")[1:21]
> gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
> gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE))
> shiftedBamfile <- file.path(outPath, "shifted.bam")
> gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
> txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
> genome <- Mmusculus
> #objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome)
> list.dirs()
[1] "."         "./splited"
> outPath <- "splited_2"
> dir.create(outPath)
> objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)
Error in value[[3L]](cond) : 
  'mergeBam' 'destination' exists, 'overwrite' is FALSE
  destination: splited_2/NucleosomeFree.bam
jianhong commented 4 years ago

I tried GreeLeaf's dataset again. There is no issue. I think there are must something I missed. Would you please share the file with me? So I can repeat your error? To share the big file to me, you can try https://cyverse.org/, here is the documents: https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting. Thank you.

On Tue, Apr 7, 2020 at 6:14 PM Gordon Beattie notifications@github.com wrote:

Thanks for the response, I tried that, same result, maybe I'm missing something? The splitGAlignmentsByCut still produces outputs in the new directory, but again only the one chromosome.

list.dirs() [1] "." outPath <- "splited" dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(BSgenome.Mmusculus.UCSC.mm10) library(TxDb.Mmusculus.UCSC.mm10.knownGene) which <- as(seqinfo(Mmusculus), "GRanges")[1:21] gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE)) shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene) genome <- Mmusculus

objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome)

list.dirs() [1] "." "./splited" outPath <- "splited_2" dir.create(outPath) objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath) Error in value[3L] : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splited_2/NucleosomeFree.bam

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/24#issuecomment-610646710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA6BW7F6GCWHCEJG32DRLOQVJANCNFSM4MDG2RLA .

-- Yours sincerely, Jianhong Ou

jianhong commented 4 years ago

Hi Gordon,

I located the issue. I will fix the bug as soon as possible.

Jianhong.

On Tue, Apr 7, 2020 at 6:14 PM Gordon Beattie notifications@github.com wrote:

Thanks for the response, I tried that, same result, maybe I'm missing something? The splitGAlignmentsByCut still produces outputs in the new directory, but again only the one chromosome.

list.dirs() [1] "." outPath <- "splited" dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(BSgenome.Mmusculus.UCSC.mm10) library(TxDb.Mmusculus.UCSC.mm10.knownGene) which <- as(seqinfo(Mmusculus), "GRanges")[1:21] gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE)) shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene) genome <- Mmusculus

objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome)

list.dirs() [1] "." "./splited" outPath <- "splited_2" dir.create(outPath) objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath) Error in value[3L] : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splited_2/NucleosomeFree.bam

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/24#issuecomment-610646710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA6BW7F6GCWHCEJG32DRLOQVJANCNFSM4MDG2RLA .

-- Yours sincerely, Jianhong Ou

jianhong commented 4 years ago

The new version of ATACseqQC should fix the issue. It will take about 2 days to release the new version. Or you can just install it from my github. Let me know if there is any problem.

On Thu, Apr 9, 2020 at 4:43 PM jianhong ou jianhong.ou@gmail.com wrote:

Hi Gordon,

I located the issue. I will fix the bug as soon as possible.

Jianhong.

On Tue, Apr 7, 2020 at 6:14 PM Gordon Beattie notifications@github.com wrote:

Thanks for the response, I tried that, same result, maybe I'm missing something? The splitGAlignmentsByCut still produces outputs in the new directory, but again only the one chromosome.

list.dirs() [1] "." outPath <- "splited" dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(BSgenome.Mmusculus.UCSC.mm10) library(TxDb.Mmusculus.UCSC.mm10.knownGene) which <- as(seqinfo(Mmusculus), "GRanges")[1:21] gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE)) shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene) genome <- Mmusculus

objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome)

list.dirs() [1] "." "./splited" outPath <- "splited_2" dir.create(outPath) objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath) Error in value[3L] : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splited_2/NucleosomeFree.bam

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/24#issuecomment-610646710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA6BW7F6GCWHCEJG32DRLOQVJANCNFSM4MDG2RLA .

-- Yours sincerely, Jianhong Ou

-- Yours sincerely, Jianhong Ou

GBeattie commented 4 years ago

Sorry for the late response, installed from github and looks to be working fine now, thank you very much for the fix!

rafet2005 commented 3 years ago

Hi, I know this is an old error, but I have the same problem. I'm using ATACseqQC 1.8.5 and R 3.6.1, and I get the same error when I run more than one chromosome. It works with no problems for one chromosome but not for multiple chromosomes. Any help is appreciated. the error message Error in value[3L] : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splitBam/NucleosomeFree.bam

my cod:

load the library

rm(list=ls()) library(ChIPpeakAnno) library(ATACseqQC) library(GenomicRanges) library(EnsDb.Hsapiens.v75) library(AnnotationDbi) library(Rsamtools) library(BSgenome.TroutArrlyShort.NCBI.PRJNA623027)

setwd("/localstorage/ATACseq/result/") trout_txdb <-makeTxDbFromGFF("/localstorage/TroutNewGenome/OmyArlee_1_1/OmyArrlyShort/GCF_013265735.2_USDA_OmykA_1.1_genomic_Short.gff") saveDb(trout_txdb, file="Trout.sqlite") trout_Annota <- loadDb("Trout.sqlite") txs <- transcripts(trout_txdb)

read bam file

bamfile <- ("/localstorage/ATACseq/fastq_Adapter_cut/alignments_ARl/F1B_Arlsorted_RMmitDup_short.bam") bamfile.labels <- gsub(".bam", "", basename(bamfile))

bam file status

bamQC(bamfileT, outPath = NULL)

generate fragement size distribution

fragSize <- fragSizeDist(bamfile, bamfile.labels)

bamfile tags to be read in

possibleTag <- combn(LETTERS, 2) possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]), paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100), param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag tags <- names(bamTop100)[lengths(bamTop100)>0] tags tags <- tags[tags!="PG"]

outPath <- "splitBam" if (dir.exists(outPath)) { unlink(outPath, recursive = TRUE, force = TRUE) } dir.create(outPath)

seqlev <-(c( "NC_048565.1","NC_048566.1"))

,"NC_048567.1","NC_048568.1","NC_048569.1","NC_048570.1","NC_048571.1",

"NC_048572.1","NC_048573.1","NC_048574.1","NC_048575.1","NC_048576.1","NC_048577.1","NC_048578.1",

"NC_048579.1","NC_048580.1","NC_048581.1","NC_048582.1","NC_048583.1","NC_048584.1","NC_048585.1",

"NC_048586.1","NC_048587.1","NC_048588.1","NC_048589.1","NC_048590.1","NC_048591.1","NC_048592.1",

"NC_050570.1","NC_050571.1","NC_050572.1","NC_048593.1"))

which <- as(seqinfo(RainbowTrout), "GRanges")

which <- as(seqinfo(RainbowTroutArrly)[seqlev], "GRanges") gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)

shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=RainbowTroutArrly, outPath=outPath ) dir(outPath)