Closed ycl6 closed 3 years ago
Thank you for reporting. I will fix that.
Best!
Your sincerely,
Jianhong Ou
On May 24, 2020, at 8:08 AM, I-Hsuan Lin notifications@github.com wrote:
Originally splitBam will take bamfile index and pass it to shiftGAlignmentsList and splitGAlignmentsByCut. Because meta$index is not empty, the wrong index (for the original bam) will get store into gal1 which has shifted.bam assgined to meta$file but not shifted.bam to its meta$index. Then splitGAlignmentsByCut halted with below error message:
Error in splitGAlignmentsByCut(gal1, breaks = breaks, labels = labels, : no reads in the obj.
By removing index param from splitBam, both shiftGAlignmentsList and splitGAlignmentsByCut will get the index from the ifelse from the respective meta$file
index <- ifelse(length(meta$index)>0, meta$index, meta$file)
Additionally, a line to include the bam index in gal1 was added in shiftGAlignmentsList.R to complete the metadata of gal1
You can view, comment on, or merge this pull request online at:
https://github.com/jianhong/ATACseqQC/pull/26
Commit Summary
fix incorrect index assigned to gal Store outbam index in gal1 File Changes
M R/shiftGAlignmentsList.R (1) M R/splitBam.R (6) Patch Links:
https://github.com/jianhong/ATACseqQC/pull/26.patch https://github.com/jianhong/ATACseqQC/pull/26.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks @jianhong
Another thing I found is that although gal1
is being added to objs
before returning in splitBam
,
objs$all <- gal1
return(invisible(objs))
objs$all
does not contain all the information as that in gal1
, hence although one can use splitBam
to perform both shiftGAlignmentsList
and splitGAlignmentsByCut
as some kind of wrapper. but functions such as PTscore
, NFRscore
and TSSEscore
cannot be performed because objs$all ≠ gal1. This then defeats the purpose of using splitBam
Another thing I am not quite sure of is why setting meta$asMates
to FALSE for shifted.bam in gal1
inside shiftGAlignmentsList
, where shifted.bam is still paired-end when the original bam is paired-end?
if(!missing(outbam)){
file.copy(from=mergedfile, to=outbam)
file.copy(from=paste0(mergedfile, ".bai"),
to=paste0(outbam, ".bai"))
gal1 <- GAlignments()
meta$file <- outbam
meta$asMates <- FALSE
meta$mpos <- mpos
metadata(gal1) <- meta
After carefully review the code, I decide not merge the pull. I still want to keep the index parameter. And I double checked the function by multiple files, I can not repeat your error.
About the PTscore, NFRscore and TSSEscore, all require GAlignments as input. If objs is the input, it will throw an error.
I set the asMate to FALSE, it does not mean it is not paired-end. It just mean it will not be read in as GAlignmentsList.
Hope this explanation will help.
Jianhong.
Hi @jianhong
Thanks for getting back to me. I am not sure why I am getting the error with splitBam
,
so I include my script below and shared the bam and index file with you through dropbox (through your duke.edu email), maybe you can take a look at it. Thanks!
library(ATACseqQC)
library(Rsamtools)
library(GenomicScores)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
bamFiles = "ATACseq_IPSC_chr19.bam"
bamFiles.labels = "ATACseq_IPSC_chr19"
outPath = bamFiles.labels
dir.create(outPath)
shiftedBamfile = file.path(outPath, "shifted.bam")
chr = "chr19"
possibleTag = combn(LETTERS, 2)
possibleTag = c(paste0(possibleTag[1, ], possibleTag[2, ]), paste0(possibleTag[2, ], possibleTag[1, ]))
genome = Mmusculus
txs = transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
sub.txs = txs[seqnames(txs) %in% chr,]
phast = getGScores("phastCons60way.UCSC.mm10")
print(bamFiles.labels)
bamTop100 = scanBam(BamFile(bamFiles, yieldSize = 100), param = ScanBamParam(tag = possibleTag))[[1]]$tag
tags = names(bamTop100)[lengths(bamTop100) == 100]
objs = splitBam(bamFiles, tags = tags, outPath = outPath, txs = sub.txs, genome = genome, conservation = phast, seqlev = chr)
Error in splitGAlignmentsByCut(gal1, breaks = breaks, labels = labels, : no reads in the obj.
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS/LAPACK: /home/ihsuan/miniconda3/lib/libopenblasp-r0.3.9.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 GenomicFeatures_1.38.2 AnnotationDbi_1.48.0 Biobase_2.46.0
[5] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.54.0 rtracklayer_1.46.0 GenomicScores_1.10.0
[9] Rsamtools_2.2.3 Biostrings_2.54.0 XVector_0.26.0 GenomicRanges_1.38.0
[13] GenomeInfoDb_1.22.1 IRanges_2.20.2 ATACseqQC_1.10.4 S4Vectors_0.24.4
[17] BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 seqinr_3.6-1 grImport2_0.2-0 ellipsis_0.3.1 futile.logger_1.4.3 base64enc_0.1-3
[7] rGADEM_2.34.1 ChIPpeakAnno_3.20.1 bit64_0.9-7 interactiveDisplayBase_1.24.0 splines_3.6.3 motifStack_1.30.0
[13] ade4_1.7-15 polynom_1.4-0 seqLogo_1.52.0 GO.db_3.10.0 dbplyr_1.4.3 png_0.1-7
[19] graph_1.64.0 shiny_1.4.0.2 BiocManager_1.30.10 compiler_3.6.3 httr_1.4.1 assertthat_0.2.1
[25] Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2 limma_3.42.2 later_1.0.0 formatR_1.7
[31] htmltools_0.4.0 prettyunits_1.1.1 tools_3.6.3 gtable_0.3.0 glue_1.4.1 GenomeInfoDbData_1.2.2
[37] dplyr_0.8.5 rappdirs_0.3.1 Rcpp_1.0.4.6 vctrs_0.3.0 multtest_2.42.0 stringr_1.4.0
[43] mime_0.9 lifecycle_0.2.0 ensembldb_2.10.2 XML_3.99-0.3 idr_1.2 AnnotationHub_2.18.0
[49] edgeR_3.28.1 zlibbioc_1.32.0 MASS_7.3-51.6 scales_1.1.1 hms_0.5.3 promises_1.1.0
[55] ProtGenerics_1.18.0 SummarizedExperiment_1.16.1 RBGL_1.62.1 AnnotationFilter_1.10.0 lambda.r_1.2.4 yaml_2.2.1
[61] curl_4.3 memoise_1.1.0 ggplot2_3.3.0 MotIV_1.42.0 biomaRt_2.42.1 stringi_1.4.6
[67] RSQLite_2.2.0 BiocVersion_3.10.1 randomForest_4.6-14 BiocParallel_1.20.1 rlang_0.4.6 pkgconfig_2.0.3
[73] matrixStats_0.56.0 bitops_1.0-6 lattice_0.20-41 purrr_0.3.4 htmlwidgets_1.5.1 GenomicAlignments_1.22.1
[79] bit_1.1-15.2 tidyselect_1.1.0 magrittr_1.5 R6_2.4.1 DelayedArray_0.12.3 DBI_1.1.0
[85] preseqR_4.0.0 pillar_1.4.4 survival_3.1-12 RCurl_1.98-1.2 tibble_3.0.1 crayon_1.3.4
[91] futile.options_1.0.1 KernSmooth_2.23-17 BiocFileCache_1.10.2 jpeg_0.1-8.1 progress_1.2.2 locfit_1.5-9.4
[97] grid_3.6.3 blob_1.2.1 digest_0.6.25 xtable_1.8-4 VennDiagram_1.6.20 httpuv_1.5.2
[103] regioneR_1.18.1 openssl_1.4.1 munsell_0.5.0 askpass_1.1
Regarding objs
, I was thinking how one can use splitBam
to accomplish both shift and split with just one function but still able to calculate the three QC scores, because if it cannot, then it would seems splitBam
is not that practical to use after all.
Given that gal1
is assigned to objs$all
, I thought I can use objs$all
to calculate the score, but found it doesn't contain the same information as gal1
that was assigned. I suppose the content that can be store by then is limited to the information that were already present in other elements in the GAlignmentsList.
I manage to print the structures of gal1
and objs$all
when running splitBam
:
This is str(gal1)
Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. ..@ values : Factor w/ 0 levels:
.. .. ..@ lengths : int(0)
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ start : int(0)
..@ cigar : chr(0)
..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. ..@ values : Factor w/ 3 levels "+","-","*":
.. .. ..@ lengths : int(0)
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ NAMES : NULL
..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
.. .. ..@ rownames : NULL
.. .. ..@ nrows : int 0
.. .. ..@ listData : Named list()
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
.. .. ..@ seqnames : chr(0)
.. .. ..@ seqlengths : int(0)
.. .. ..@ is_circular: logi(0)
.. .. ..@ genome : chr(0)
..@ elementType : chr "ANY"
..@ metadata :List of 5
.. ..$ file : chr "ATACseq_IPSC_chr19/shifted.bam"
.. ..$ param :Formal class 'ScanBamParam' [package "Rsamtools"] with 8 slots
.. .. .. ..@ flag : Named int [1:2] 4095 1275
.. .. .. .. ..- attr(*, "names")= chr [1:2] "keep0" "keep1"
.. .. .. ..@ simpleCigar : logi FALSE
.. .. .. ..@ reverseComplement: logi FALSE
.. .. .. ..@ tag : chr [1:10] "AS" "MD" "XG" "NM" ...
.. .. .. ..@ tagFilter : list()
.. .. .. ..@ what : chr [1:7] "qname" "flag" "mapq" "isize" ...
.. .. .. ..@ which :Formal class 'CompressedIRangesList' [package "IRanges"] with 5 slots
.. .. .. .. .. ..@ unlistData :Formal class 'IRanges' [package "IRanges"] with 6 slots
.. .. .. .. .. .. .. ..@ start : int 1
.. .. .. .. .. .. .. ..@ width : int 61431566
.. .. .. .. .. .. .. ..@ NAMES : chr "chr19"
.. .. .. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. .. .. ..@ metadata : list()
.. .. .. .. .. ..@ elementType : chr "IRanges"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. .. .. ..@ partitioning :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
.. .. .. .. .. .. .. ..@ end : int 1
.. .. .. .. .. .. .. ..@ NAMES : chr "chr19"
.. .. .. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. .. .. ..@ metadata : list()
.. .. .. ..@ mapqFilter : int NA
.. ..$ asMates: logi FALSE
.. ..$ which :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
.. .. .. ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. .. .. .. ..@ values : Factor w/ 1 level "chr19": 1
.. .. .. .. .. ..@ lengths : int 1
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
.. .. .. .. .. ..@ start : int 1
.. .. .. .. .. ..@ width : int 61431566
.. .. .. .. .. ..@ NAMES : chr "chr19"
.. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. .. .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
.. .. .. .. .. ..@ lengths : int 1
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
.. .. .. .. .. ..@ seqnames : chr "chr19"
.. .. .. .. .. ..@ seqlengths : int 61431566
.. .. .. .. .. ..@ is_circular: logi FALSE
.. .. .. .. .. ..@ genome : chr "mm10"
.. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
.. .. .. .. .. ..@ rownames : NULL
.. .. .. .. .. ..@ nrows : int 1
.. .. .. .. .. ..@ listData : Named list()
.. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..@ elementType : chr "ANY"
.. .. .. ..@ metadata : list()
.. ..$ mpos : Named int [1:961764] 41005997 41005994 36933961 36934385 7586559 7586542 5795907 5795937 57855236 57855400 ...
.. .. ..- attr(*, "names")= chr [1:961764] "SRR5870471.10000055 41005994" "SRR5870471.10000055 41005997" "SRR5870471.10000068 36934385" "SRR5870471.10000068 36933961" ...
This is str(objs$all)
Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. ..@ values : Factor w/ 66 levels "chr1","chr2",..:
.. .. ..@ lengths : int(0)
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ start : int(0)
..@ cigar : chr(0)
..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
.. .. ..@ values : Factor w/ 3 levels "+","-","*":
.. .. ..@ lengths : int(0)
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ NAMES : chr(0)
..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
.. .. ..@ rownames : NULL
.. .. ..@ nrows : int 0
.. .. ..@ listData :List of 17
.. .. .. ..$ qname: chr(0)
.. .. .. ..$ flag : int(0)
.. .. .. ..$ mapq : int(0)
.. .. .. ..$ isize: int(0)
.. .. .. ..$ seq :Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
.. .. .. .. .. ..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"] with 2 slots
.. .. .. .. .. .. .. ..@ xp_list : list()
.. .. .. .. .. .. .. ..@ .link_to_cached_object_list: list()
.. .. .. .. .. ..@ ranges :Formal class 'GroupedIRanges' [package "XVector"] with 7 slots
.. .. .. .. .. .. .. ..@ group : int(0)
.. .. .. .. .. .. .. ..@ start : int(0)
.. .. .. .. .. .. .. ..@ width : int(0)
.. .. .. .. .. .. .. ..@ NAMES : NULL
.. .. .. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. .. .. ..@ metadata : list()
.. .. .. .. .. ..@ elementType : chr "DNAString"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..$ qual :Formal class 'PhredQuality' [package "Biostrings"] with 5 slots
.. .. .. .. .. ..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"] with 2 slots
.. .. .. .. .. .. .. ..@ xp_list : list()
.. .. .. .. .. .. .. ..@ .link_to_cached_object_list: list()
.. .. .. .. .. ..@ ranges :Formal class 'GroupedIRanges' [package "XVector"] with 7 slots
.. .. .. .. .. .. .. ..@ group : int(0)
.. .. .. .. .. .. .. ..@ start : int(0)
.. .. .. .. .. .. .. ..@ width : int(0)
.. .. .. .. .. .. .. ..@ NAMES : NULL
.. .. .. .. .. .. .. ..@ elementType : chr "ANY"
.. .. .. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. .. .. ..@ metadata : list()
.. .. .. .. .. ..@ elementType : chr "BString"
.. .. .. .. .. ..@ elementMetadata: NULL
.. .. .. .. .. ..@ metadata : list()
.. .. .. ..$ mrnm : Factor w/ 66 levels "chr1","chr2",..:
.. .. .. ..$ AS : int(0)
.. .. .. ..$ XG : int(0)
.. .. .. ..$ NM : int(0)
.. .. .. ..$ XM : int(0)
.. .. .. ..$ XN : int(0)
.. .. .. ..$ XO : int(0)
.. .. .. ..$ XS : int(0)
.. .. .. ..$ YS : int(0)
.. .. .. ..$ YT : chr(0)
.. .. .. ..$ mpos : Named int(0)
.. .. .. .. ..- attr(*, "names")= chr(0)
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
.. .. ..@ seqnames : chr [1:66] "chr1" "chr2" "chr3" "chr4" ...
.. .. ..@ seqlengths : int [1:66] 195471971 182113224 160039680 156508116 151834684 149736546 145441459 129401213 124595110 130694993 ...
.. .. ..@ is_circular: logi [1:66] NA NA NA NA NA NA ...
.. .. ..@ genome : chr [1:66] NA NA NA NA ...
..@ elementType : chr "ANY"
..@ metadata : list()
objs
is a GAlignmentsList, whereas objs$all
is a GAlignments and potentially an acceptable input to calculate QC scores
> class(objs)
[1] "GAlignmentsList"
attr(,"package")
[1] "GenomicAlignments"
> class(objs$all)
[1] "GAlignments"
attr(,"package")
[1] "GenomicAlignments"
Originally
splitBam
will take bamfile index and pass it toshiftGAlignmentsList
andsplitGAlignmentsByCut
. Becausemeta$index
is not empty, the wrong index (for the original bam) will get store into gal1 which hasshifted.bam
assgined tometa$file
but notshifted.bam
to itsmeta$index
. ThensplitGAlignmentsByCut
halted with below error message:By removing index param from
splitBam
, bothshiftGAlignmentsList
andsplitGAlignmentsByCut
will get the index from theifelse
from the respectivemeta$file
index <- ifelse(length(meta$index)>0, meta$index, meta$file)
Additionally, a line to include the bam index in gal1 was added in
shiftGAlignmentsList.R
to complete the metadata of gal1