wzthu / esATAC

Bioconductor package esATAC: an Easy-to-use Systematic pipeline for ATAC-seq data analysis
https://www.bioconductor.org/packages/release/bioc/html/esATAC.html
GNU General Public License v3.0
23 stars 11 forks source link

Error in RmotifScan, character not in lookup table #150

Closed rcavalcante closed 5 years ago

rcavalcante commented 5 years ago

Hello,

Thanks for developing esATAC, I'm having an issue at the motif scanning part in the atacPipe2() function. Below is my call:

D7_MSC_IMMOB_v_D7_MSC_MOB = atacPipe2(
    case = list(
        fastqInput1 = file.path(run_dir, 'Sample_130301/130301_CGTACTAG-ACTGCATA_S5_R1_001.fastq.gz'),
        fastqInput2 = file.path(run_dir, 'Sample_130301/130301_CGTACTAG-ACTGCATA_S5_R2_001.fastq.gz')),
    control = list(
        fastqInput1 = file.path(run_dir, 'Sample_130299/130299_AGGCAGAA-GTAAGGAG_S3_R1_001.fastq.gz'),
        fastqInput2 = file.path(run_dir, 'Sample_130299/130299_AGGCAGAA-GTAAGGAG_S3_R2_001.fastq.gz')),
    refdir = '/nfs/med-bfx-common/iGenomes/Mus_musculus/UCSC/mm10_esatac',
    motifs = getMotifInfo(motif.file = file.path(project_dir, 'Huber_2860_ATAC01_AH-205', 'MSC_jaspar.txt')),
    tmpdir = 'tmp',
    threads = 2,
    interleave = FALSE,
    genome = 'mm10')

When the RMotifScan part of the pipeline begins I get the following:

...
>>>>>>========================================
2019-05-23 11:34:30
start processing(paired end data): RMotifScan
processing file:
peak file:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq.bed
Output destination:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq_Case_data_MotifScanOutput
Error in .Call2("new_CHARACTER_from_XStringSet", x, xs_dec_lkup(x), PACKAGE = "Biostrings") :
  key 48 (char '0') not in lookup table

I feel like this error comes from a non-IUPAC letter as it relates to XStringSets. I looked through my JASPAR formatted motif file and didn't see anything amiss. So I tried using the file that you've included with the package:

motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))

And I actually get a similar error:

>>>>>>========================================
2019-05-23 12:23:45
start processing(paired end data): RMotifScan
processing file:
peak file:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq.bed
Output destination:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq_Case_data_MotifScanOutput
Error in .Call2("new_CHARACTER_from_XStringSet", x, xs_dec_lkup(x), PACKAGE = "Biostrings") : 
  key 48 (char '0') not in lookup table

I made sure that my file and your file are very similar:

> readLines(system.file("extdata", "CustomizedMotif.txt", package="esATAC"))
 [1] ">MA0139.1\tCTCF"                                                                                                                           
 [2] "A  [    87    167    281     56      8    744     40    107    851      5    333     54     12     56    104    372     82    117    402 ]"
 [3] "C  [   291    145     49    800    903     13    528    433     11      0      3     12      0      8    733     13    482    322    181 ]"
 [4] "G  [    76    414    449     21      0     65    334     48     32    903    566    504    890    775      5    507    307     73    266 ]"
 [5] "T  [   459    187    134     36      2     91     11    324     18      3      9    341      8     71     67     17     37    396     59 ]"
 [6] ""                                                                                                                                          
 [7] ">MA0835.1\tBATF3"                                                                                                                          
 [8] "A  [    40     23    396      1      0    429      1      8      0      6    420      5     18    357 ]"                                   
 [9] "C  [    98     11     11      0      0      0    497      0      0    513      0     39    520     24 ]"                                   
[10] "G  [    30    777     20      0    776      0      0    746      2      2      0     27      7    147 ]"                                   
[11] "T  [   625     17      1    732      0      0      0      0    692      3      2    684     16     49 ]"                                   
> head(readLines(file.path(project_dir, 'Huber_2860_ATAC01_AH-205', 'MSC_jaspar.txt')), n = 11)
 [1] ">PH0024.1\tDlx5"                                                                                                      
 [2] "A  [    11     17     28     18      1     97     98      1      1     61     19      5      7     26     15     19 ]"
 [3] "C  [    29     29     17     14     38      2      0      1      0      0     25     49     17     34     22     27 ]"
 [4] "G  [    37     37     49     59      0      0      1      0      2     38     37     17     37     23     22     35 ]"
 [5] "T  [    22     17      5      9     61      1      1     98     97      1     18     28     39     17     41     19 ]"
 [6] ""                                                                                                                     
 [7] ">MA0482.1\tGata4"                                                                                                     
 [8] "A  [    10      0      0      0   2707      0      0    547      0    601    386 ]"                                   
 [9] "C  [  1039   2165     89      0     39      0   2746      0   1240   1004    929 ]"                                   
[10] "G  [   374    306      0      0      0      0      0      0    940    157    682 ]"                                   
[11] "T  [  1323    275   2657   2746      0   2746      0   2199    566    984    749 ]"   

Have you encountered this problem before? I can't seem to find anything via Google.

Thanks in advance, Raymond

And here is my session info:

> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 3.5.0 (2018-04-23)
 os       Red Hat Enterprise Linux Server 7.6 (Maipo)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Detroit
 date     2019-05-23

─ Packages ───────────────────────────────────────────────────────────────────
 package                            * version   date       lib source
 annotate                             1.58.0    2018-06-27 [2] Bioconductor
 AnnotationDbi                      * 1.44.0    2018-10-30 [1] Bioconductor
 assertthat                           0.2.0     2017-04-11 [2] CRAN (R 3.5.0)
 backports                            1.1.2     2017-12-13 [2] CRAN (R 3.5.0)
 Biobase                            * 2.40.0    2018-06-27 [2] Bioconductor
 BiocGenerics                       * 0.28.0    2018-10-30 [1] Bioconductor
 BiocInstaller                        1.30.0    2018-06-27 [2] Bioconductor
 BiocParallel                       * 1.14.2    2018-07-08 [2] Bioconductor
 biomaRt                              2.36.1    2018-06-27 [2] Bioconductor
 Biostrings                         * 2.48.0    2018-06-27 [2] Bioconductor
 bit                                  1.1-14    2018-05-29 [2] CRAN (R 3.5.0)
 bit64                                0.9-7     2017-05-08 [2] CRAN (R 3.5.0)
 bitops                               1.0-6     2013-08-17 [2] CRAN (R 3.5.0)
 blob                                 1.1.1     2018-03-25 [2] CRAN (R 3.5.0)
 boot                                 1.3-20    2017-08-06 [2] CRAN (R 3.5.0)
 brew                                 1.0-6     2011-04-13 [2] CRAN (R 3.5.0)
 BSgenome                           * 1.48.0    2018-06-27 [2] Bioconductor
 BSgenome.Mmusculus.UCSC.mm10       * 1.4.0     2019-05-22 [1] Bioconductor
 callr                                3.1.1     2018-12-21 [1] CRAN (R 3.5.0)
 caTools                              1.17.1.1  2018-07-20 [2] CRAN (R 3.5.0)
 ChIPseeker                           1.16.1    2018-07-21 [2] Bioconductor
 cli                                  1.0.1     2018-09-25 [1] CRAN (R 3.5.0)
 clusterProfiler                      3.8.1     2018-06-27 [2] Bioconductor
 CNEr                                 1.16.1    2018-06-27 [2] Bioconductor
 colorspace                           1.3-2     2016-12-14 [2] CRAN (R 3.5.0)
 corrplot                             0.84      2017-10-16 [2] CRAN (R 3.5.0)
 cowplot                              0.9.3     2018-07-15 [2] CRAN (R 3.5.0)
 crayon                               1.3.4     2017-09-16 [2] CRAN (R 3.5.0)
 data.table                           1.11.4    2018-05-27 [2] CRAN (R 3.5.0)
 DBI                                  1.0.0     2018-05-02 [2] CRAN (R 3.5.0)
 DelayedArray                       * 0.8.0     2018-10-30 [1] Bioconductor
 desc                                 1.2.0     2018-05-01 [1] CRAN (R 3.5.0)
 devtools                             2.0.1     2018-10-26 [1] CRAN (R 3.5.0)
DiagrammeR                           1.0.0     2018-03-01 [2] CRAN (R 3.5.0)
 digest                               0.6.16    2018-08-22 [2] CRAN (R 3.5.0)
 DirichletMultinomial                 1.22.0    2018-06-27 [2] Bioconductor
 DO.db                                2.9       2018-06-27 [2] Bioconductor
 DOSE                                 3.6.1     2018-06-20 [2] Bioconductor
 downloader                           0.4       2015-07-09 [2] CRAN (R 3.5.0)
 dplyr                                0.8.0.1   2019-02-15 [1] CRAN (R 3.5.0)
 enrichplot                           1.0.2     2018-06-27 [2] Bioconductor
 esATAC                             * 1.2.1     2018-08-07 [1] Bioconductor
 evaluate                             0.11      2018-07-17 [2] CRAN (R 3.5.0)
 fastmatch                            1.1-0     2017-01-28 [2] CRAN (R 3.5.0)
 fgsea                                1.6.0     2018-06-27 [2] Bioconductor
 formatR                              1.5       2017-04-25 [2] CRAN (R 3.5.0)
 fs                                   1.2.6     2018-08-23 [1] CRAN (R 3.5.0)
 futile.logger                        1.4.3     2016-07-10 [2] CRAN (R 3.5.0)
 futile.options                       1.0.1     2018-04-20 [2] CRAN (R 3.5.0)
 gdata                                2.18.0    2017-06-06 [2] CRAN (R 3.5.0)
 GenomeInfoDb                       * 1.16.0    2018-06-27 [2] Bioconductor
 GenomeInfoDbData                     1.1.0     2018-06-27 [2] Bioconductor
 GenomicAlignments                  * 1.16.0    2018-06-27 [2] Bioconductor
 GenomicFeatures                    * 1.32.3    2018-10-05 [1] Bioconductor
 GenomicRanges                      * 1.32.7    2018-09-20 [1] Bioconductor
 ggforce                              0.1.3     2018-07-07 [2] CRAN (R 3.5.0)
 ggplot2                              3.1.0     2018-10-25 [1] CRAN (R 3.5.0)
 ggraph                               1.0.2     2018-07-07 [2] CRAN (R 3.5.0)
 ggrepel                              0.8.0     2018-05-09 [2] CRAN (R 3.5.0)
 ggridges                             0.5.0     2018-04-05 [2] CRAN (R 3.5.0)
 glue                                 1.3.0     2018-07-17 [2] CRAN (R 3.5.0)
 GO.db                                3.6.0     2018-06-27 [2] Bioconductor
 GOSemSim                             2.6.2     2018-08-08 [2] Bioconductor
 gplots                               3.0.1     2016-03-30 [2] CRAN (R 3.5.0)
 gridBase                             0.4-7     2014-02-24 [2] CRAN (R 3.5.0)
 gridExtra                            2.3       2017-09-09 [2] CRAN (R 3.5.0)
 gtable                               0.2.0     2016-02-26 [2] CRAN (R 3.5.0)
 gtools                               3.8.1     2018-06-26 [2] CRAN (R 3.5.0)
 hms                                  0.4.2     2018-03-10 [2] CRAN (R 3.5.0)
 htmltools                            0.3.6     2017-04-28 [2] CRAN (R 3.5.0)
 htmlwidgets                          1.2       2018-04-19 [2] CRAN (R 3.5.0)
 httr                                 1.4.0     2018-12-11 [1] CRAN (R 3.5.0)
 hwriter                              1.3.2     2014-09-10 [2] CRAN (R 3.5.0)
 igraph                               1.2.2     2018-07-27 [2] CRAN (R 3.5.0)
 influenceR                           0.1.0     2015-09-03 [2] CRAN (R 3.5.0)
 IRanges                            * 2.16.0    2018-10-30 [1] Bioconductor
 JASPAR2016                           1.8.0     2018-06-27 [2] Bioconductor
 jsonlite                           * 1.5       2017-06-01 [2] CRAN (R 3.5.0)
 KEGGREST                             1.20.1    2018-06-27 [2] Bioconductor
 KernSmooth                           2.23-15   2015-06-29 [2] CRAN (R 3.5.0)
 knitr                                1.20      2018-02-20 [2] CRAN (R 3.5.0)
 lambda.r                             1.2.3     2018-05-17 [2] CRAN (R 3.5.0)
 lattice                              0.20-35   2017-03-25 [2] CRAN (R 3.5.0)
 latticeExtra                         0.6-28    2016-02-09 [2] CRAN (R 3.5.0)
 lazyeval                             0.2.1     2017-10-29 [2] CRAN (R 3.5.0)
 magrittr                           * 1.5       2014-11-22 [2] CRAN (R 3.5.0)
 MASS                                 7.3-50    2018-04-30 [2] CRAN (R 3.5.0)
 Matrix                               1.2-14    2018-04-13 [2] CRAN (R 3.5.0)
 matrixStats                        * 0.54.0    2018-07-23 [2] CRAN (R 3.5.0)
 memoise                              1.1.0     2017-04-21 [2] CRAN (R 3.5.0)
 motifmatchr                          1.2.0     2018-06-27 [2] Bioconductor
 munsell                              0.5.0     2018-06-12 [2] CRAN (R 3.5.0)
 org.Mm.eg.db                       * 3.7.0     2019-03-07 [1] Bioconductor
 pillar                               1.3.1     2018-12-15 [1] CRAN (R 3.5.0)
 pkgbuild                             1.0.2     2018-10-16 [1] CRAN (R 3.5.0)
 pkgconfig                            2.0.2     2018-08-16 [2] CRAN (R 3.5.0)
 pkgload                              1.0.2     2018-10-29 [1] CRAN (R 3.5.0)
 plotrix                              3.7-2     2018-05-27 [2] CRAN (R 3.5.0)
 plyr                                 1.8.4     2016-06-08 [2] CRAN (R 3.5.0)
 png                                  0.1-7     2013-12-03 [2] CRAN (R 3.5.0)
 poweRlaw                             0.70.1    2017-08-29 [2] CRAN (R 3.5.0)
 prettyunits                          1.0.2     2015-07-13 [2] CRAN (R 3.5.0)
 processx                             3.2.1     2018-12-05 [1] CRAN (R 3.5.0)
 progress                             1.2.0     2018-06-14 [2] CRAN (R 3.5.0)
 ps                                   1.3.0     2018-12-21 [1] CRAN (R 3.5.0)
 purrr                                0.2.5     2018-05-29 [2] CRAN (R 3.5.0)
qvalue                               2.12.0    2018-06-27 [2] Bioconductor
 R.methodsS3                        * 1.7.1     2016-02-16 [2] CRAN (R 3.5.0)
 R.oo                               * 1.22.0    2018-04-22 [2] CRAN (R 3.5.0)
 R.utils                            * 2.6.0     2017-11-05 [2] CRAN (R 3.5.0)
 R6                                   2.4.0     2019-02-14 [1] CRAN (R 3.5.0)
 Rbowtie2                             1.2.0     2018-06-27 [2] Bioconductor
 RColorBrewer                         1.1-2     2014-12-07 [1] CRAN (R 3.5.0)
 Rcpp                                 1.0.0     2018-11-07 [1] CRAN (R 3.5.0)
 RCurl                                1.95-4.11 2018-07-15 [2] CRAN (R 3.5.0)
 readr                                1.1.1     2017-05-16 [2] CRAN (R 3.5.0)
 remotes                              2.0.2     2018-10-30 [1] CRAN (R 3.5.0)
 reshape2                             1.4.3     2017-12-11 [2] CRAN (R 3.5.0)
 rgexf                                0.15.3    2015-03-24 [2] CRAN (R 3.5.0)
 rJava                                0.9-10    2018-05-29 [2] CRAN (R 3.5.0)
 rlang                                0.3.1     2019-01-08 [1] CRAN (R 3.5.0)
 rmarkdown                            1.10      2018-06-11 [2] CRAN (R 3.5.0)
 Rook                                 1.1-1     2014-10-20 [2] CRAN (R 3.5.0)
 rprojroot                            1.3-2     2018-01-03 [2] CRAN (R 3.5.0)
 Rsamtools                          * 1.34.1    2019-01-31 [1] Bioconductor
 RSQLite                              2.1.1     2018-05-06 [2] CRAN (R 3.5.0)
 rstudioapi                           0.7       2017-09-07 [2] CRAN (R 3.5.0)
 rtracklayer                        * 1.42.2    2019-03-01 [1] Bioconductor
 rvcheck                              0.1.0     2018-05-23 [2] CRAN (R 3.5.0)
 S4Vectors                          * 0.20.1    2018-11-09 [1] Bioconductor
 scales                               1.0.0     2018-08-09 [2] CRAN (R 3.5.0)
 seqLogo                              1.46.0    2018-06-27 [2] Bioconductor
 sessioninfo                          1.1.1     2018-11-05 [1] CRAN (R 3.5.0)
 ShortRead                          * 1.38.0    2018-06-27 [2] Bioconductor
 stringi                              1.2.4     2018-07-20 [2] CRAN (R 3.5.0)
 stringr                            * 1.3.1     2018-05-10 [2] CRAN (R 3.5.0)
 SummarizedExperiment               * 1.10.1    2018-06-27 [2] Bioconductor
 TFBSTools                            1.18.0    2018-06-27 [2] Bioconductor
 TFMPvalue                            0.0.8     2018-05-16 [2] CRAN (R 3.5.0)
 tibble                               2.0.1     2019-01-12 [1] CRAN (R 3.5.0)
 tidyr                                0.8.1     2018-05-18 [2] CRAN (R 3.5.0)
 tidyselect                           0.2.5     2018-10-11 [1] CRAN (R 3.5.0)
 tweenr                               0.1.5     2016-10-10 [2] CRAN (R 3.5.0)
 TxDb.Hsapiens.UCSC.hg19.knownGene    3.2.2     2018-06-27 [2] Bioconductor
 TxDb.Mmusculus.UCSC.mm10.knownGene * 3.4.0     2019-05-22 [1] Bioconductor
 units                                0.6-0     2018-06-09 [2] CRAN (R 3.5.0)
 UpSetR                               1.3.3     2017-03-21 [2] CRAN (R 3.5.0)
 usethis                              1.4.0     2018-08-14 [1] CRAN (R 3.5.0)
 VennDiagram                          1.6.20    2018-03-28 [2] CRAN (R 3.5.0)
 VGAM                                 1.0-6     2018-08-18 [2] CRAN (R 3.5.0)
 viridis                              0.5.1     2018-03-29 [2] CRAN (R 3.5.0)
 viridisLite                          0.3.0     2018-02-01 [2] CRAN (R 3.5.0)
 visNetwork                           2.0.4     2018-06-14 [2] CRAN (R 3.5.0)
 withr                                2.1.2     2018-03-15 [2] CRAN (R 3.5.0)
 XML                                  3.98-1.16 2018-08-19 [2] CRAN (R 3.5.0)
 xtable                               1.8-2     2016-02-05 [2] CRAN (R 3.5.0)
 XVector                            * 0.20.0    2018-06-27 [2] Bioconductor
 zlibbioc                             1.26.0    2018-06-27 [2] Bioconductor

[1] /home/rcavalca/R/x86_64-redhat-linux-gnu-library/3.5
[2] /app/med/centos7/R/3.5.0/lib64/R/library
rcavalcante commented 5 years ago

I should also add that when I don't include a motif parameter at all, so I'm guessing this looks through the entire JASPAR database, I get the same error.

rcavalcante commented 5 years ago

And one more update, the problem seems to occur for the test data provided in the vignette as well:

conclusion2 <- 
    atacPipe2(
        case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                  fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")), 
        control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                     fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
        refdir = '/ccmb/BioinfCore/Common/iGenomes/Homo_sapiens/UCSC/hg38_esatac',
        genome = "hg38",
        motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC")))

I used an existing bowtie2 index on our servers to speed things up a bit.

Honchkrow commented 5 years ago

Hi, I re-ran the pipeline using test data, but cannot reproduce your problem. I think the question is from the function motifmatchr::matchMotifs because the error was from XStringSet-class and I did not use it in RMotifScan. Therefore, I guess that the error was caused by motifmatchr::matchMotifs which I used in RMotifScan (actually I just wrap this function in our pipeline). I recommend you try matchMotifs and test whether this problem occurs again. What's more, I ran the pipeline in the latest R and esATAC. Maybe you can update R and esATAC to avoid this error.

Wei

rcavalcante commented 5 years ago

Hi Wei,

Thanks for the response. I took some time today to build a docker image using Bioc 3.9 and the test data seems to be working now. I have confidence that the actual data I'm interested in will run fine.

Thanks, Raymond