jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

Question about readBamFile function #21

Open El-Castor opened 4 years ago

El-Castor commented 4 years ago

Hi, First thanks to build this package it is very interesting tool. So I'm trying to integrate your package in my ATACseq pipeline.

But I have some issue. In fact, when I trying to build the "gal" object as your exemple in the vignette, this object is empty but my bamfile is not. the fact is i'm charging my own bamfile because it's not on your package. in end I have a empty splitted bam file.

here you have the code that I run :

`

Loading Input files / directory

bamfile_wthDup=scanBam(bamFilePath_wthDup) bamfile_whtoutDup=scanBam(bamFilePath_wthoutDup)

bamfile tags to be read in

possibleTag <- combn(LETTERS, 2)

possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]), paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamFilePath_wthoutDup, yieldSize = 100), param = ScanBamParam(tag=possibleTag))[[1]]$tag tags <- names(bamTop100)[lengths(bamTop100)>0] tags

library(BSgenome.Cmelo.FLOCAD.CMiso1.1)

seqlev <- "CMiso1.1chr01" # sumsmple data for quick run which <- as(seqinfo(CMiso)[seqlev], "GRanges") gal <- readBamFile("bamfile_whtoutDup", tag=tags, which=which, asMates=TRUE, bigFile=TRUE) # TODO : here the bamfile seems to be not read shiftedBamfile <- file.path(outPath, "shifted.bam")

` Have any idea what's I'm doing wrong ? Thanks in advance

El-Castor commented 4 years ago

Hi, I've tested you exmple that you present on your vignette but I have the same issue as my data. see below the terminal output :

`

bamfile <- system.file("extdata", "GL1.bam", package="ATACseqQC", mustWork=TRUE) head(bamfile) [1] "/opt/share/FLOCAD/userspace/cpichot/miniconda3/envs/ATACseqQC2/lib/R/library/ATACseqQC/extdata/GL1.bam" bamfile.labels <- gsub(".bam", "", basename(bamfile)) library(BSgenome.Hsapiens.UCSC.hg19) seqlev <- "chr1" which <- as(seqinfo(Hsapiens)[seqlev], "GRanges") which GRanges object with 1 range and 0 metadata columns: seqnames ranges strand

chr1 chr1 1-249250621 * ------- seqinfo: 1 sequence from hg19 genome gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE) gal GAlignmentsList object of length 0: <0 elements> shiftedBamfile <- file.path(outPath, "shifted.bam") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) [E::sam_parse1] unrecognized type N [W::sam_read1] Parse error at line 26 `
jianhong commented 4 years ago

Hi Pichot,

In the default setting of readBamFile, the bigFile is set to TRUE. That is to say it will not read the reads into memory at readBamFile step. So when you check the length of the gal, it will show 0.

Did you tried shiftGAlignmentsList? Is the output file of shiftGAlignmentList empty?

Jianhong.

On Wed, Jan 29, 2020 at 12:49 PM pichot notifications@github.com wrote:

Hi, First thanks to build this package it is very interesting tool. So I'm trying to integrate your package in my ATACseq pipeline.

But I have some issue. In fact, when I trying to build the "gal" object as your exemple in the vignette, this object is empty but my bamfile is not. the fact is i'm charging my own bamfile because it's not on your package. in end I have a empty splitted bam file.

here you have the code that I run :

` Loading Input files / directory

bamfile_wthDup=scanBam(bamFilePath_wthDup) bamfile_whtoutDup=scanBam(bamFilePath_wthoutDup) bamfile tags to be read in

possibleTag <- combn(LETTERS, 2)

possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]), paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamFilePath_wthoutDup, yieldSize = 100), param = ScanBamParam(tag=possibleTag))[[1]]$tag tags <- names(bamTop100)[lengths(bamTop100)>0] tags

library(BSgenome.Cmelo.FLOCAD.CMiso1.1)

seqlev <- "CMiso1.1chr01" # sumsmple data for quick run which <- as(seqinfo(CMiso)[seqlev], "GRanges") gal <- readBamFile("bamfile_whtoutDup", tag=tags, which=which, asMates=TRUE, bigFile=TRUE) # TODO : here the bamfile seems to be not read shiftedBamfile <- file.path(outPath, "shifted.bam")

` Have any idea what's I'm doing wrong ? Thanks in advance

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21?email_source=notifications&email_token=ABLBEAY2OBIGXIM6Z36O2LTRAG6UPA5CNFSM4KNI32JKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IJTORLA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA4IKFIHWW355PLQSFLRAG6UPANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

El-Castor commented 4 years ago

Hi Jianhong,

Yes I've try the shiftGAlignmentsList function but it didn't work, I've an empty shifted bam file and more over I've this error :

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) [E::sam_parse1] unrecognized type N [W::sam_read1] Parse error at line 26

jianhong commented 4 years ago

Hi pichot,

Do you mind to share the head of your bamfile by samtools:

samtools view -H bamfilename

On Thu, Jan 30, 2020 at 8:31 AM pichot notifications@github.com wrote:

Hi Jianhong,

Yes I've try the shiftGAlignmentsList function but it didn't work, I've an empty shifted bam file and more over I've this error :

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) [E::sam_parse1] unrecognized type N [W::sam_read1] Parse error at line 26

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21?email_source=notifications&email_token=ABLBEA3J7J6ZLPGKOV7VTY3RALJBJA5CNFSM4KNI32JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKK7NSI#issuecomment-580253385, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA4RJZLJBJ6AFU545ADRALJBJANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

jianhong commented 4 years ago

Your reads are paired ends or single ends?

On Thu, Jan 30, 2020 at 8:31 AM pichot notifications@github.com wrote:

Hi Jianhong,

Yes I've try the shiftGAlignmentsList function but it didn't work, I've an empty shifted bam file and more over I've this error :

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) [E::sam_parse1] unrecognized type N [W::sam_read1] Parse error at line 26

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21?email_source=notifications&email_token=ABLBEA3J7J6ZLPGKOV7VTY3RALJBJA5CNFSM4KNI32JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKK7NSI#issuecomment-580253385, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA4RJZLJBJ6AFU545ADRALJBJANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

El-Castor commented 4 years ago

Thanks to quickly respond,

here you have my header :

@HD VN:1.5 SO:coordinate @SQ SN:CMiso1.1chr00 LN:3219671 @SQ SN:CMiso1.1chr01 LN:35819275 @SQ SN:CMiso1.1chr02 LN:24717502 @SQ SN:CMiso1.1chr03 LN:29901275 @SQ SN:CMiso1.1chr04 LN:36545425 @SQ SN:CMiso1.1chr05 LN:29172381 @SQ SN:CMiso1.1chr06 LN:36200319 @SQ SN:CMiso1.1chr07 LN:26767028 @SQ SN:CMiso1.1chr08 LN:33813569 @SQ SN:CMiso1.1chr09 LN:23958596 @SQ SN:CMiso1.1chr10 LN:24822432 @SQ SN:CMiso1.1chr11 LN:33524158 @SQ SN:CMiso1.1chr12 LN:26620111 @PG ID:bowtie2 PN:bowtie2 VN:2.3.4.1 CL:"/opt/share/FLOCAD/userspace/cpichot/miniconda3/envs/ATACseq/bin/bowtie2-align-s --wrapper basic-0 -x 4-MappingAndCalling_RmDup/indexes/CMiso1.1_genome.fa -X 2000 --very-sensitive --no-mixed --no-discordant -p 12 --passthrough -1 /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_sequencingTest/2-Trimming/output_test_sequencing/trimmomatic/trimmed_H3Rep4_S3_L001_R1_001.fastq.gz -2 /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_sequencingTest/2-Trimming/output_test_sequencing/trimmomatic/trimmed_H3Rep4_S3_L001_R2_001.fastq.gz" @PG ID:MarkDuplicates VN:2.17.10-SNAPSHOT CL:MarkDuplicates INPUT=[4-MappingAndCalling_RmDup/mapping/filtered/trimmed_H3Rep4_S3_L001_001_filtered.bam.tmp.qMapFiltered.bam] OUTPUT=4-MappingAndCalling_RmDup/mapping/filtered/trimmed_H3Rep4_S3_L001_001_filtered.bam.tmp.qMapFiltered.duplicatesRemoved.bam METRICS_FILE=4-MappingAndCalling_RmDup/mapping/filtered/log_trimmed_H3Rep4_S3_L001_001_stats.txt.marked_dup_metrics.txt REMOVE_DUPLICATES=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates

but I don't thinks that it's the probleme because when I try your code with the data that you have on your package I have the same error.

More over my bam file are pair-end

thanks a lot

jianhong commented 4 years ago

Then it is very strange because the package passed all the test platform for the vignettes. If I can not repeat your error, it is very difficult to debug.

Jianhong.

On Thu, Jan 30, 2020 at 9:15 AM pichot notifications@github.com wrote:

Thanks to quickly respond,

here you have my header :

@hd https://github.com/hd VN:1.5 SO:coordinate @sq https://github.com/sq SN:CMiso1.1chr00 LN:3219671 @sq https://github.com/sq SN:CMiso1.1chr01 LN:35819275 @sq https://github.com/sq SN:CMiso1.1chr02 LN:24717502 @sq https://github.com/sq SN:CMiso1.1chr03 LN:29901275 @sq https://github.com/sq SN:CMiso1.1chr04 LN:36545425 @sq https://github.com/sq SN:CMiso1.1chr05 LN:29172381 @sq https://github.com/sq SN:CMiso1.1chr06 LN:36200319 @sq https://github.com/sq SN:CMiso1.1chr07 LN:26767028 @sq https://github.com/sq SN:CMiso1.1chr08 LN:33813569 @sq https://github.com/sq SN:CMiso1.1chr09 LN:23958596 @sq https://github.com/sq SN:CMiso1.1chr10 LN:24822432 @sq https://github.com/sq SN:CMiso1.1chr11 LN:33524158 @sq https://github.com/sq SN:CMiso1.1chr12 LN:26620111 @pg https://github.com/pg ID:bowtie2 PN:bowtie2 VN:2.3.4.1 CL:"/opt/share/FLOCAD/userspace/cpichot/miniconda3/envs/ATACseq/bin/bowtie2-align-s --wrapper basic-0 -x 4-MappingAndCalling_RmDup/indexes/CMiso1.1_genome.fa -X 2000 --very-sensitive --no-mixed --no-discordant -p 12 --passthrough -1 /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_sequencingTest/2-Trimming/output_test_sequencing/trimmomatic/trimmed_H3Rep4_S3_L001_R1_001.fastq.gz -2 /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_sequencingTest/2-Trimming/output_test_sequencing/trimmomatic/trimmed_H3Rep4_S3_L001_R2_001.fastq.gz" @pg https://github.com/pg ID:MarkDuplicates VN:2.17.10-SNAPSHOT CL:MarkDuplicates INPUT=[4-MappingAndCalling_RmDup/mapping/filtered/trimmed_H3Rep4_S3_L001_001_filtered.bam.tmp.qMapFiltered.bam] OUTPUT=4-MappingAndCalling_RmDup/mapping/filtered/trimmed_H3Rep4_S3_L001_001_filtered.bam.tmp.qMapFiltered.duplicatesRemoved.bam METRICS_FILE=4-MappingAndCalling_RmDup/mapping/filtered/log_trimmed_H3Rep4_S3_L001_001_stats.txt.marked_dup_metrics.txt REMOVE_DUPLICATES=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates

but I don't thinks that it's the probleme because when I try your code with the data that you have on your package I have the same error.

More over my bam file are pair-end

thanks a lot

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21?email_source=notifications&email_token=ABLBEAZVGN2OBIGRM2UQUBLRALOGRA5CNFSM4KNI32JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKLEDQA#issuecomment-580272576, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA6RKOFZFSB6UZMSVH3RALOGRANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

El-Castor commented 4 years ago

hi Jianhong,

I understand that it's difficult in this way to debug but I have do exactly the same things as advocate in the vignette and I have load the data avaible in the package. That's why I don't understand where is the issue. I will check all my dependency comparing to your session info. Tell me if you have any idea please.

Thanks

El-Castor commented 4 years ago

hi Jianhong, I've resolve the issues finally, but now when I run the function shiftGAlignmentsList() I have an error because of the PG tags "Marduplicates".

here you have the error : gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) [bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2106:23406:17450" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2104:12761:8947" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:1114:16639:1911" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. The results is that I have a truncated shifted bam, because I think that the shifting stop when it treat these reads.

I don't understand why this causes issues during the shifting, do you have any idea please ? thanks a lot

GiuseppeLeuzzi commented 4 years ago

I have exactly the same issue! This generates unreadable tags. However if I extract single chromosomes from the bam file the tags are fine and it works. But of course the goal will be to run the entire bam!

jianhong commented 4 years ago

Hi, Could you please confirm that you are using most closest released version of ATACseqQC? If it still get the same error, could you share your error? If you are talking about the PG tags, did you tried to read the files without PG tag? I can not repeat this error for lacking the sample bam files. Is it possible to share your bam file to me?

On Wed, Apr 22, 2020 at 10:18 AM resel86 notifications@github.com wrote:

I have exactly the same issue! This generates unreadable tags. However if I extract single chromosomes from the bam file the tags are fine and it works. But of course the goal will be to run the entire bam!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21#issuecomment-617807582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA3YZTWSQWR3KQM2XCLRN34EPANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

GiuseppeLeuzzi commented 4 years ago

Hi,

thank you so much for your help.

Here a link where you can download 3 different BAM: https://drive.google.com 1) A full bam file 2) chr9 extracted from the full BAM file ( it works) 3) chr5 extracted from the full BAM file (it does not work)

Below my R session info. The fact that with one chr it works but not with the other one, as well as with the full BAM, make me feel that there is something wrong. Then, I use R version 3.6.2 and ATACseqQC was built on the 3.6.3 version, but i have doubt this is the problem.

I guess the problems are the TAGS it founds, in the chr5 or in the full bam it is able to find only PG and YT, that are not useful for the downstream analysis.

Thanks again,

Let me know what you find,

All the best, Giuseppe Leuzzi

R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods
[10] base

other attached packages: [1] CENTIPEDE_1.2 phastCons100way.UCSC.hg38_3.7.1
[3] GenomicScores_1.10.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0 [5] GenomicFeatures_1.38.2 AnnotationDbi_1.48.0
[7] BSgenome.Hsapiens.UCSC.hg38_1.4.1 MotifDb_1.28.0
[9] ChIPpeakAnno_3.20.1 VennDiagram_1.6.20
[11] futile.logger_1.4.3 ATACseqQC_1.10.4
[13] BSgenome_1.54.0 rtracklayer_1.46.0
[15] GenomicAlignments_1.22.1 Rsamtools_2.2.3
[17] Biostrings_2.54.0 XVector_0.26.0
[19] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[21] BiocParallel_1.20.1 matrixStats_0.56.0
[23] Biobase_2.46.0 GenomicRanges_1.38.0
[25] GenomeInfoDb_1.22.1 IRanges_2.20.2
[27] S4Vectors_0.24.4 BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] colorspace_1.4-1 seqinr_3.6-1 grImport2_0.2-0
[4] ellipsis_0.3.0 base64enc_0.1-3 rGADEM_2.34.1
[7] rstudioapi_0.11 bit64_0.9-7 interactiveDisplayBase_1.24.0 [10] splines_3.6.2 motifStack_1.30.0 ade4_1.7-15
[13] splitstackshape_1.4.8 polynom_1.4-0 seqLogo_1.52.0
[16] GO.db_3.10.0 dbplyr_1.4.3 png_0.1-7
[19] graph_1.64.0 shiny_1.4.0.2 BiocManager_1.30.10
[22] compiler_3.6.2 httr_1.4.1 assertthat_0.2.1
[25] Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2
[28] limma_3.42.2 later_1.0.0 formatR_1.7
[31] htmltools_0.4.0 prettyunits_1.1.1 tools_3.6.2
[34] gtable_0.3.0 glue_1.4.0 GenomeInfoDbData_1.2.2
[37] dplyr_0.8.5 rappdirs_0.3.1 Rcpp_1.0.4
[40] vctrs_0.2.4 multtest_2.42.0 stringr_1.4.0
[43] mime_0.9 lifecycle_0.2.0 ensembldb_2.10.2
[46] XML_3.99-0.3 idr_1.2 AnnotationHub_2.18.0
[49] edgeR_3.28.1 zlibbioc_1.32.0 MASS_7.3-51.5
[52] scales_1.1.0 hms_0.5.3 promises_1.1.0
[55] ProtGenerics_1.18.0 RBGL_1.62.1 AnnotationFilter_1.10.0
[58] lambda.r_1.2.4 yaml_2.2.1 curl_4.3
[61] memoise_1.1.0 MotIV_1.42.0 ggplot2_3.3.0
[64] biomaRt_2.42.1 stringi_1.4.6 RSQLite_2.2.0
[67] BiocVersion_3.10.1 randomForest_4.6-14 rlang_0.4.5
[70] pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41
[73] purrr_0.3.4 htmlwidgets_1.5.1 bit_1.1-15.2
[76] tidyselect_1.0.0 magrittr_1.5 R6_2.4.1
[79] DBI_1.1.0 preseqR_4.0.0 pillar_1.4.3
[82] survival_3.1-12 RCurl_1.98-1.2 tibble_3.0.1
[85] crayon_1.3.4 futile.options_1.0.1 KernSmooth_2.23-16
[88] BiocFileCache_1.10.2 jpeg_0.1-8.1 progress_1.2.2
[91] locfit_1.5-9.4 data.table_1.12.8 blob_1.2.1
[94] digest_0.6.25 xtable_1.8-4 httpuv_1.5.2
[97] regioneR_1.18.1 openssl_1.4.1 munsell_0.5.0
[100] askpass_1.1

On Apr 23, 2020, at 12:24 PM, JIANHONG OU notifications@github.com wrote:

Hi, Could you please confirm that you are using most closest released version of ATACseqQC? If it still get the same error, could you share your error? If you are talking about the PG tags, did you tried to read the files without PG tag? I can not repeat this error for lacking the sample bam files. Is it possible to share your bam file to me?

On Wed, Apr 22, 2020 at 10:18 AM resel86 notifications@github.com wrote:

I have exactly the same issue! This generates unreadable tags. However if I extract single chromosomes from the bam file the tags are fine and it works. But of course the goal will be to run the entire bam!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21#issuecomment-617807582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA3YZTWSQWR3KQM2XCLRN34EPANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21#issuecomment-618498431, or unsubscribe https://github.com/notifications/unsubscribe-auth/APJNKBREATWOL4YUUZQCTSTROBTVVANCNFSM4KNI32JA.

jianhong commented 4 years ago

Thank you for your files. The newest version will be release in the coming days. I tested in my side, it should be fixed. The issue is that if there is any NA or empty values, the wrong tag will be generated. Indeed this is a bug in export of upstream package. I tried to avoid this by remove the tags with NA values. In most cases it does not hurt.

GiuseppeLeuzzi commented 4 years ago

Alright. Thanks a lot. So if I well understand in the new version, my BAM files work. Is it right? Could you please inform us when the new version will be released? I think this pipeline will be very useful, and it generates the best footPrint plot across all the pipeline currently available.

I have another question. To change the axis in the footPrint plot do we really need to use CENTIPEDE files as input files or are you also implementing factorFootprints function?

jianhong commented 4 years ago

For footPrint, you can follow the vignette to generate one. You do not need to use CENTIPEPE files as input. You can feed the binding sites predicted by other tools such as FIMO instead of default one. Please see ?factorFootprints for more details.

On Mon, Apr 27, 2020 at 10:36 AM resel86 notifications@github.com wrote:

Alright. Thanks a lot. So if I well understand in the new version, my BAM files work. Is it right? Could you please inform us when the new version will be released? I think this pipeline will be very useful, and it generates the best footPrint plot across all the pipeline currently available.

I have another question. To change the axis in the footPrint plot do we really need to use CENTIPEDE files as input files or are you also implementing factorFootprints function?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21#issuecomment-620027456, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA3ZOLB2HPVZY2DTAMLROWKADANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou

GiuseppeLeuzzi commented 4 years ago

Hi, I tried a new version, released on first May. I got the same issue. Any suggestions? Thanks Giuseppe

On Apr 27, 2020, at 10:14 AM, JIANHONG OU notifications@github.com wrote:

 Thank you for your files. The newest version will be release in the coming days. I tested in my side, it should be fixed. The issue is that if there is any NA or empty values, the wrong tag will be generated. Indeed this is a bug in export of upstream package. I tried to avoid this by remove the tags with NA values. In most cases it does not hurt.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jianhong commented 4 years ago

I will double check that. It will still output the warnings but will generate the split bam files.

El-Castor commented 4 years ago

Hi Jianhong,

I have install your latest version of ATACseqQC ( version 1.12.0) because I see that you fix it the issue with PG reads tags. So I have re-test the analysis using this version and it doesn't work I have the same issue, see bellow the error :

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile.outPath)
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2106:23406:17450" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2104:12761:8947" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:1114:16639:1911" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
> head(gal1)
GAlignments object with 0 alignments and 0 metadata columns:
   seqnames strand       cigar    qwidth     start       end     width
      <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
       njunc
   <integer>
  -------
  seqinfo: no sequences

Do you have any suggestion @jianhong ? @resel86 do you have resolved the problem in your case?

Thanks in advance

El-Castor commented 4 years ago

Hi JianHong,

I've try to understant the problem by myself but I found nothing.

Here You have the structure of the gal object created with my data :

> str(gal)
Formal class 'GAlignmentsList' [package "GenomicAlignments"] with 5 slots
  ..@ unlistData     :Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
  .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 0 levels: 
  .. .. .. .. ..@ lengths        : int(0) 
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ start          : int(0) 
  .. .. ..@ cigar          : chr(0) 
  .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 
  .. .. .. .. ..@ lengths        : int(0) 
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. ..@ nrows          : int 0
  .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. ..@ seqnames   : chr(0) 
  .. .. .. .. ..@ seqlengths : int(0) 
  .. .. .. .. ..@ is_circular: logi(0) 
  .. .. .. .. ..@ genome     : chr(0) 
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ metadata       : list()
  ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 0
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ elementType    : chr "GAlignments"
  ..@ metadata       :List of 4
  .. ..$ file   : chr "/K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_s"| __truncated__
  .. ..$ param  :Formal class 'ScanBamParam' [package "Rsamtools"] with 8 slots
  .. .. .. ..@ flag             : Named int [1:2] 4095 1275
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "keep0" "keep1"
  .. .. .. ..@ simpleCigar      : logi FALSE
  .. .. .. ..@ reverseComplement: logi FALSE
  .. .. .. ..@ tag              : chr [1:11] "AS" "MD" "PG" "XG" ...
  .. .. .. ..@ tagFilter        : list()
  .. .. .. ..@ what             : chr [1:7] "qname" "flag" "mapq" "isize" ...
  .. .. .. ..@ which            :Formal class 'CompressedIRangesList' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ unlistData     :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. .. .. ..@ start          : int 1
  .. .. .. .. .. .. .. ..@ width          : int 35819275
  .. .. .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ elementType    : chr "IRanges"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. .. .. ..@ end            : int 1
  .. .. .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ mapqFilter       : int NA
  .. ..$ asMates: logi TRUE
  .. ..$ which  :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 1 level "CMiso1.1chr01": 1
  .. .. .. .. .. ..@ lengths        : int 1
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. ..@ start          : int 1
  .. .. .. .. .. ..@ width          : int 35819275
  .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
  .. .. .. .. .. ..@ lengths        : int 1
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. .. ..@ seqnames   : chr "CMiso1.1chr01"
  .. .. .. .. .. ..@ seqlengths : int 35819275
  .. .. .. .. .. ..@ is_circular: logi FALSE
  .. .. .. .. .. ..@ genome     : chr "CMiso1.1"
  .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 1
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ metadata       : list()
  ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. ..@ end            : int(0) 
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()

Here your have the structure of the gal1 object :


> str(gal1)
Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
  ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 0 levels: 
  .. .. ..@ lengths        : int(0) 
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ start          : int(0) 
  ..@ cigar          : chr(0) 
  ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 
  .. .. ..@ lengths        : int(0) 
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ NAMES          : NULL
  ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 0
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. ..@ seqnames   : chr(0) 
  .. .. ..@ seqlengths : int(0) 
  .. .. ..@ is_circular: logi(0) 
  .. .. ..@ genome     : chr(0) 
  ..@ elementType    : chr "ANY"
  ..@ metadata       :List of 5
  .. ..$ file   : chr "/K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/ATACseqQC/"| __truncated__
  .. ..$ param  :Formal class 'ScanBamParam' [package "Rsamtools"] with 8 slots
  .. .. .. ..@ flag             : Named int [1:2] 4095 1275
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "keep0" "keep1"
  .. .. .. ..@ simpleCigar      : logi FALSE
  .. .. .. ..@ reverseComplement: logi FALSE
  .. .. .. ..@ tag              : chr [1:11] "AS" "MD" "PG" "XG" ...
  .. .. .. ..@ tagFilter        : list()
  .. .. .. ..@ what             : chr [1:7] "qname" "flag" "mapq" "isize" ...
  .. .. .. ..@ which            :Formal class 'CompressedIRangesList' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ unlistData     :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. .. .. ..@ start          : int 1
  .. .. .. .. .. .. .. ..@ width          : int 35819275
  .. .. .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ elementType    : chr "IRanges"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. .. .. ..@ end            : int 1
  .. .. .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ mapqFilter       : int NA
  .. ..$ asMates: logi FALSE
  .. ..$ which  :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 1 level "CMiso1.1chr01": 1
  .. .. .. .. .. ..@ lengths        : int 1
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. ..@ start          : int 1
  .. .. .. .. .. ..@ width          : int 35819275
  .. .. .. .. .. ..@ NAMES          : chr "CMiso1.1chr01"
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
  .. .. .. .. .. ..@ lengths        : int 1
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. .. ..@ seqnames   : chr "CMiso1.1chr01"
  .. .. .. .. .. ..@ seqlengths : int 35819275
  .. .. .. .. .. ..@ is_circular: logi FALSE
  .. .. .. .. .. ..@ genome     : chr "CMiso1.1"
  .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 1
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ metadata       : list()
  .. ..$ mpos   : Named int [1:14648] 3206665 3206661 34755542 34755546 1146226 1146230 21652570 21652566 28881975 28881971 ...
  .. .. ..- attr(*, "names")= chr [1:14648] "M01263:122:000000000-CWJNL:1:1101:10024:18646 3206661" "M01263:122:000000000-CWJNL:1:1101:10024:18646 3206665" "M01263:122:000000000-CWJNL:1:1101:10027:3624 34755546" "M01263:122:000000000-CWJNL:1:1101:10027:3624 34755542" ...
>

So to resume, I have these object when I select just one chr but If I run for the whole genome my gal structur is :

> str(gal)
Formal class 'GAlignmentsList' [package "GenomicAlignments"] with 5 slots
  ..@ unlistData     :Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
  .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 0 levels: 
  .. .. .. .. ..@ lengths        : int(0) 
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ start          : int(0) 
  .. .. ..@ cigar          : chr(0) 
  .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 
  .. .. .. .. ..@ lengths        : int(0) 
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. ..@ nrows          : int 0
  .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. ..@ seqnames   : chr(0) 
  .. .. .. .. ..@ seqlengths : int(0) 
  .. .. .. .. ..@ is_circular: logi(0) 
  .. .. .. .. ..@ genome     : chr(0) 
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ metadata       : list()
  ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 0
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ elementType    : chr "GAlignments"
  ..@ metadata       :List of 4
  .. ..$ file   : chr "/K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/Analysis_s"| __truncated__
  .. ..$ param  :Formal class 'ScanBamParam' [package "Rsamtools"] with 8 slots
  .. .. .. ..@ flag             : Named int [1:2] 4095 1275
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "keep0" "keep1"
  .. .. .. ..@ simpleCigar      : logi FALSE
  .. .. .. ..@ reverseComplement: logi FALSE
  .. .. .. ..@ tag              : chr [1:11] "AS" "MD" "PG" "XG" ...
  .. .. .. ..@ tagFilter        : list()
  .. .. .. ..@ what             : chr [1:7] "qname" "flag" "mapq" "isize" ...
  .. .. .. ..@ which            :Formal class 'CompressedIRangesList' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ unlistData     :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. .. .. ..@ start          : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. .. .. ..@ width          : int [1:13] 3219671 35819275 24717502 29901275 36545425 29172381 36200319 26767028 33813569 23958596 ...
  .. .. .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ elementType    : chr "IRanges"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. .. .. ..@ end            : int [1:13] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ NAMES          : chr [1:13] "CMiso1.1chr00" "CMiso1.1chr01" "CMiso1.1chr02" "CMiso1.1chr03" ...
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ mapqFilter       : int NA
  .. ..$ asMates: logi TRUE
  .. ..$ which  :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 13 levels "CMiso1.1chr00",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ lengths        : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. ..@ start          : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. ..@ width          : int [1:13] 3219671 35819275 24717502 29901275 36545425 29172381 36200319 26767028 33813569 23958596 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
  .. .. .. .. .. ..@ lengths        : int 13
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. .. ..@ seqnames   : chr [1:13] "CMiso1.1chr00" "CMiso1.1chr01" "CMiso1.1chr02" "CMiso1.1chr03" ...
  .. .. .. .. .. ..@ seqlengths : int [1:13] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ is_circular: logi [1:13] NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ genome     : chr [1:13] NA NA NA NA ...
  .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 13
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ metadata       : list()
  ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. ..@ end            : int(0) 
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
>

It seems that is read good the bam file and take in count all the chromosomes. Then When I try to shiffted theses sequences the gal1 structure is :

> str(gal1)
Formal class 'GAlignments' [package "GenomicAlignments"] with 9 slots
  ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 0 levels: 
  .. .. ..@ lengths        : int(0) 
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ start          : int(0) 
  ..@ cigar          : chr(0) 
  ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 
  .. .. ..@ lengths        : int(0) 
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ NAMES          : NULL
  ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 0
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. ..@ seqnames   : chr(0) 
  .. .. ..@ seqlengths : int(0) 
  .. .. ..@ is_circular: logi(0) 
  .. .. ..@ genome     : chr(0) 
  ..@ elementType    : chr "ANY"
  ..@ metadata       :List of 5
  .. ..$ file   : chr "/K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometrie_dec2019_flowers_FMH_rep4_100K/ATACseqQC/"| __truncated__
  .. ..$ param  :Formal class 'ScanBamParam' [package "Rsamtools"] with 8 slots
  .. .. .. ..@ flag             : Named int [1:2] 4095 1275
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "keep0" "keep1"
  .. .. .. ..@ simpleCigar      : logi FALSE
  .. .. .. ..@ reverseComplement: logi FALSE
  .. .. .. ..@ tag              : chr [1:11] "AS" "MD" "PG" "XG" ...
  .. .. .. ..@ tagFilter        : list()
  .. .. .. ..@ what             : chr [1:7] "qname" "flag" "mapq" "isize" ...
  .. .. .. ..@ which            :Formal class 'CompressedIRangesList' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ unlistData     :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. .. .. ..@ start          : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. .. .. ..@ width          : int [1:13] 3219671 35819275 24717502 29901275 36545425 29172381 36200319 26767028 33813569 23958596 ...
  .. .. .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ elementType    : chr "IRanges"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. .. .. ..@ end            : int [1:13] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ NAMES          : chr [1:13] "CMiso1.1chr00" "CMiso1.1chr01" "CMiso1.1chr02" "CMiso1.1chr03" ...
  .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ mapqFilter       : int NA
  .. ..$ asMates: logi FALSE
  .. ..$ which  :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 13 levels "CMiso1.1chr00",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ lengths        : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. ..@ start          : int [1:13] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. ..@ width          : int [1:13] 3219671 35819275 24717502 29901275 36545425 29172381 36200319 26767028 33813569 23958596 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
  .. .. .. .. .. ..@ lengths        : int 13
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. .. ..@ seqnames   : chr [1:13] "CMiso1.1chr00" "CMiso1.1chr01" "CMiso1.1chr02" "CMiso1.1chr03" ...
  .. .. .. .. .. ..@ seqlengths : int [1:13] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ is_circular: logi [1:13] NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ genome     : chr [1:13] NA NA NA NA ...
  .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 13
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ metadata       : list()
  .. ..$ mpos   : Named int [1:1647080] 1975990 1975986 242840 242844 1446805 1446709 1980200 1980204 2253044 2253040 ...
  .. .. ..- attr(*, "names")= chr [1:1647080] "M01263:122:000000000-CWJNL:1:1101:10001:5248 1975986" "M01263:122:000000000-CWJNL:1:1101:10001:5248 1975990" "M01263:122:000000000-CWJNL:1:1101:10003:13415 242844" "M01263:122:000000000-CWJNL:1:1101:10003:13415 242840" ...

Unlike when I shifft the splitted bam file I have an error (the same error that I explain in my previous post) :

> gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile.outPath)
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2106:23406:17450" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:2104:12761:8947" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:1114:16639:1911" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "M01263:122:000000000-CWJNL:1:1101:11065:6255" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.

If I check my bam (concerning the whole bam file) It is truncatated.

When I check the bam file I can see a big differences in size between the non and the shiffted bam file. If I check I can see that all the chromosome has reads. Does the function produce a bam file with only the reads that is take in count for the downstream analysis, for exemple the one that are not PCR duplicate as "PG" reads tagged ?

Do you have any idea what I can do to help you to resolved this issue?

Thanks in advances

jianhong commented 4 years ago

Thank you for letting me know. I am happy to know you fixed the problem. I also updated the development version of the package. It may provide more useful information in warning or error messages. Hope it helps.

Let me know if you still have questions.

Jianhong.

On Mon, Jun 8, 2020 at 11:05 AM pichot notifications@github.com wrote:

Hi,

I have found the issue. In fat I'm working on cluster build in windows base server, so it construct path with "()" as you can see in this part of the log bismark mapper file :

Input files to be analysed (in current folder '/ips2/IPS2 (7864)/DDEVE (8095)/FLOCAD (8099)/Bioinfo/Workspace/ClementP/DavidL/bisulfite-seq/pipe_building/methylstar/Zebularine_treatment_out/bismark-mappers'): /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/bisulfite-seq/pipe_building/methylstar/Zebularine_treatment_out/trimmomatic-files/Mock_FDLM202341331-1a_H3V2NDSXY_L4_paired_1.fq.gz /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/bisulfite-seq/pipe_building/methylstar/Zebularine_treatment_out/trimmomatic-files/Mock_FDLM202341331-1a_H3V2NDSXY_L4_paired_2.fq.gz Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!) Output will be written into the directory: /ips2/IPS2 (7864)/DDEVE (8095)/FLOCAD (8099)/Bioinfo/Workspace/ClementP/DavidL/bisulfite-seq/pipe_building/methylstar/Zebularine_treatment_out/bismark-mappers/ Summary of all aligner options: -q -N 0 -L 20 --score-min L,0,-0.2 -p 2 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 Running Bismark Parallel version. Number of parallel instances to be spawned: 2

this path more precisly: Output will be written into the directory: /ips2/IPS2 (7864)/DDEVE (8095)/FLOCAD (8099)/Bioinfo/Workspace/ClementP/DavidL/bisulfite-seq/pipe_building/methylstar/Zebularine_treatment_out/bismark-mappers/

I think, in your pipeline code you take the path to "bismark-mappers" with an ls or something like that right? Do you know how I can custom this part of your code to bypass this issue ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jianhong/ATACseqQC/issues/21#issuecomment-640688085, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBEA4XXSILAZ7TB43ANQLRVT43JANCNFSM4KNI32JA .

-- Yours sincerely, Jianhong Ou