jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

Error: subscript contains invalid names for splitGAlignmentsByCut() function #48

Closed hyjforesight closed 2 years ago

hyjforesight commented 2 years ago

Hello ATACseqQC, I'm now running ATACseqQC on mouse data. All steps work fine until splitGAlignmentsByCut(), i got error Error: subscript contains invalid names. I check several times but find no spelling errors for the arguments of splitGAlignmentsByCut(). By the way, these codes work fine for running the official tutorial. Could you please help me to solve this error? Thanks! Best, YJ

library(ATACseqQC)
library(ChIPpeakAnno)
library(MotifDb)
library(GenomicAlignments)
library(Rsamtools)
library(GenomicScores)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
mm10_gscore <- getGScores("phastCons60way.UCSC.mm10")

possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM", "TC", "UQ"), "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR", "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD", "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU", "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS", "U2"))
bamTop100 <- scanBam(BamFile(file="D:/HYJ/postQC_sorted_Cracd_KO1.bam", yieldSize = 100), param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
seqinformation <- seqinfo(TxDb.Mmusculus.UCSC.mm10.knownGene)
which <- as(seqinformation, "GRanges")
gal <- readBamFile(bamFile="D:/HYJ/postQC_sorted_Cracd_KO1.bam", which=which, tag=tags, asMates=TRUE, bigFile=TRUE)
gal1 <- shiftGAlignmentsList(gal, outbam="D:/HYJ/shifted_postQC_sorted_Cracd_KO1.bam")

txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome <- Mmusculus

objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, conservation=mm10_gscore, outPath="D:/HYJ/splited_Cracd_KO1/")
Error: subscript contains invalid names
hyjforesight commented 2 years ago

Hi @jianhong The problem was solved by deleting conservation=mm10_gscore in objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, conservation=mm10_gscore, outPath="D:/HYJ/splited_Cracd_KO1/"). The invalid names comes from mm10_gscore.

I found you answered a similar question at https://support.bioconductor.org/p/96226/. For human, we can do below coding because there is phastCons100way.UCSC.hg19 and it works well

objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, conservation=phastCons100way.UCSC.hg19, outPath="D:/HYJ/splited_Cracd_KO1/")

but for mice, there is no phastCons60way.UCSC.mm10, so I call mm10_gscore <- getGScores("phastCons60way.UCSC.mm10") first, but this mm10_gscore is invalid for splitGAlignmentsByCut(). It will be great if you could share some opinions on why this argument is invalid. We appreciate it! Thanks! Best, YJ

jianhong commented 2 years ago

Thank you for reporting this. I will fix this issue soon.

jianhong commented 2 years ago

I pushed a patch for this issue. Please try to install the development version by:

BiocManager::install("jianhong/ATACseqQC")

and then try it again.

hyjforesight commented 2 years ago

hello @jianhong Thanks for pushing this patch. I'm running again now, but the splitGAlignmentsByCut() is so low when conservation=mm10_gscore is added. It's already 12h, but the split bams haven't been generated yet. Is there any way to speed up this process?

jianhong commented 2 years ago

In background, it is running randomForest to assign the fragment. It will be very resources consuming. I think you can skip conservation score if you are not interested in nucleosome position.

hyjforesight commented 2 years ago

Thanks @jianhong . Sorry, I give it up for running this, so I don't know whether the bug is fixed. If there's any chance I will run the nucleosome positioning, I'll let you know.

fuyudawn commented 2 years ago

The problem still exists when I try is it because my R is 4.1.2? objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath, conservation=phastCons100way.UCSC.hg38)

jianhong commented 2 years ago

Please remove the conservation score in current run and just use the fragment size to split the fragment. Looks like the resources issue is still there.