ivanek / Gviz

This is the Gviz development repository. Gviz plots data and annotation information along genomic coordinates.
https://bioconductor.org/packages/Gviz/
Artistic License 2.0
75 stars 10 forks source link

alignmentsTrack: SNP histogram does not match pileup #61

Open nick-youngblut opened 2 years ago

nick-youngblut commented 2 years ago

When I view a particular contig from 1-1000 bp, the alignmentsTrack shows a lot of SNPs in the histogram at the start of the contig. The high number of SNPs in the histogram do not match the SNPs in the pileup part of the track (much fewer). When I zoom into that region of the contig (1-300 bp), most of the SNPs in the histogram disappear, and now the SNPs match the pileup. See the attached pics.

b1-b1k b1-b300

sessionInfo

R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /tmp/global2/nyoungblut/code/DeepMAsED/conda_envs/dm-genome/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Gviz_1.38.0          GenomicRanges_1.46.0 GenomeInfoDb_1.30.0 
[4] IRanges_2.28.0       S4Vectors_0.32.0     BiocGenerics_0.40.0 

loaded via a namespace (and not attached):
  [1] ProtGenerics_1.26.0         bitops_1.0-7               
  [3] matrixStats_0.61.0          bit64_4.0.5                
  [5] filelock_1.0.2              RColorBrewer_1.1-2         
  [7] progress_1.2.2              httr_1.4.2                 
  [9] repr_1.1.4                  backports_1.4.1            
 [11] tools_4.1.2                 utf8_1.2.2                 
 [13] R6_2.5.1                    rpart_4.1.16               
 [15] lazyeval_0.2.2              Hmisc_4.6-0                
 [17] DBI_1.1.2                   colorspace_2.0-2           
 [19] nnet_7.3-17                 gridExtra_2.3              
 [21] tidyselect_1.1.1            prettyunits_1.1.1          
 [23] bit_4.0.4                   curl_4.3.2                 
 [25] compiler_4.1.2              Biobase_2.54.0             
 [27] htmlTable_2.4.0             xml2_1.3.3                 
 [29] DelayedArray_0.20.0         rtracklayer_1.54.0         
 [31] checkmate_2.0.0             scales_1.1.1               
 [33] rappdirs_0.3.3              pbdZMQ_0.3-7               
 [35] stringr_1.4.0               digest_0.6.29              
 [37] Rsamtools_2.10.0            foreign_0.8-82             
 [39] XVector_0.34.0              dichromat_2.0-0            
 [41] base64enc_0.1-3             jpeg_0.1-9                 
 [43] pkgconfig_2.0.3             htmltools_0.5.2            
 [45] MatrixGenerics_1.6.0        ensembldb_2.18.1           
 [47] dbplyr_2.1.1                fastmap_1.1.0              
 [49] BSgenome_1.62.0             htmlwidgets_1.5.4          
 [51] rlang_0.4.12                rstudioapi_0.13            
 [53] RSQLite_2.2.8               BiocIO_1.4.0               
 [55] generics_0.1.2              jsonlite_1.7.3             
 [57] BiocParallel_1.28.0         dplyr_1.0.7                
 [59] VariantAnnotation_1.40.0    RCurl_1.98-1.5             
 [61] magrittr_2.0.2              GenomeInfoDbData_1.2.7     
 [63] Formula_1.2-4               Matrix_1.4-0               
 [65] Rcpp_1.0.8                  IRkernel_1.3               
 [67] munsell_0.5.0               fansi_1.0.2                
 [69] lifecycle_1.0.1             stringi_1.7.6              
 [71] yaml_2.2.2                  SummarizedExperiment_1.24.0
 [73] zlibbioc_1.40.0             BiocFileCache_2.2.0        
 [75] blob_1.2.2                  parallel_4.1.2             
 [77] crayon_1.4.2                lattice_0.20-45            
 [79] IRdisplay_1.1               Biostrings_2.62.0          
 [81] splines_4.1.2               GenomicFeatures_1.46.1     
 [83] hms_1.1.1                   KEGGREST_1.34.0            
 [85] knitr_1.37                  pillar_1.7.0               
 [87] uuid_1.0-3                  rjson_0.2.21               
 [89] biomaRt_2.50.0              XML_3.99-0.8               
 [91] glue_1.6.1                  evaluate_0.14              
 [93] biovizBase_1.42.0           latticeExtra_0.6-29        
 [95] data.table_1.14.2           png_0.1-7                  
 [97] vctrs_0.3.8                 gtable_0.3.0               
 [99] purrr_0.3.4                 assertthat_0.2.1           
[101] cachem_1.0.6                ggplot2_3.3.5              
[103] xfun_0.29                   AnnotationFilter_1.18.0    
[105] restfulr_0.0.13             survival_3.2-13            
[107] tibble_3.1.6                GenomicAlignments_1.30.0   
[109] AnnotationDbi_1.56.1        memoise_2.0.1              
[111] cluster_2.1.2               ellipsis_0.3.2             
nick-youngblut commented 2 years ago

An example from another contig, in which the SNPs in the histogram don't match the pileup, even when zoomed into to a small region of the contig:

SNP_histo

I'm just using bowtie2 to map reads to the contigs assembled with those reads. The alignemntTracks is an indexed BAM, and the sequenceTrack is an indexed fasta.

ivanek commented 2 years ago

Hi @nick-youngblut,

Would you mind to share part of your data (filtered reads for this region as a BAM file)?

Best Robert

nick-youngblut commented 2 years ago

The files are available at http://ftp.tue.mpg.de/ebio/nyoungblut/. They include all contigs, in case you want to assess other contigs than the ones shown above