joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

`plot_net` possible problem (bug) with graph plotting #1418

Open missuse opened 3 years ago

missuse commented 3 years ago

Hi,

first I'd like to thank you for a great package. I have been using it for years.

To the problem:

A simple dataset:

library(phyloseq)
data(enterotype)
sub_enter <- subset_samples(enterotype,
                            SampleID %in% c("DA.AD.4",
                                            "ES.AD.1",
                                            "ES.AD.2",
                                            "ES.AD.3", 
                                            "FR.AD.1",
                                            "FR.AD.2"))

distance(sub_enter, "jaccard")

output

          DA.AD.4   ES.AD.1   ES.AD.2   ES.AD.3   FR.AD.1
ES.AD.1 0.7697417                                        
ES.AD.2 0.5515371 0.7112008                              
ES.AD.3 0.5972919 0.5868003 0.3124345                    
FR.AD.1 0.4585093 0.6707855 0.5268626 0.5223181          
FR.AD.2 0.4301436 0.6708381 0.5342431 0.5480318 0.2894279

Here we see that the most similar samples are FR.AD.1 and FR.AD.2 ( 0.2894279) as well as ES.AD.2 and ES.AD.3 (0.3124345)

set.seed(1)
plot_net(sub_enter, "jaccard", point_label = "SampleID")

Rplot

On the graph ES.AD.2 and ES.AD.3 are not connected at all. On the other hand FR.AD.1 is connected to ES.AD.2 (distance 0.5268626)

Could you explain why?

EDIT:

I trust this would be the correct graph:

Rplot012

The problem I believe is in lines:


link_layout = function(LinksData, vertexDT) {
  linkstart = copy(vertexDT[LinksData$v1, x, y])
  linkend = copy(vertexDT[LinksData$v2, x, y])
  setnames(linkend, old = c("y", "x"), new = c("yend", 
                                               "xend"))
  LinksData <- copy(cbind(LinksData, linkstart, linkend))
  return(LinksData)
}

These should be changed to

link_layout = function(LinksData, vertexDT) {
  linkstart = copy(vertexDT[LinksData$v1, 1:2])
  linkend = copy(vertexDT[LinksData$v2, 1:2])
  setnames(linkend, old = c("y", "x"), new = c("yend", 
                                               "xend"))
  LinksData <- copy(cbind(LinksData, linkstart, linkend))
  return(LinksData)
}

current function example

plot_net(sub_enter, "jaccard", point_label = "SampleID", maxdist = 0.5)`

Rplot01

changed function with the above mentioned lines:

Rplot02

I think the latter makes much more sense in the context of the distance matrix used

distance(sub_enter, "jaccard")

          DA.AD.4   ES.AD.1   ES.AD.2   ES.AD.3   FR.AD.1
ES.AD.1 0.7697417                                        
ES.AD.2 0.5515371 0.7112008                              
ES.AD.3 0.5972919 0.5868003 0.3124345                    
FR.AD.1 0.4585093 0.6707855 0.5268626 0.5223181          
FR.AD.2 0.4301436 0.6708381 0.5342431 0.5480318 0.2894279

also

plot_network(
make_network(sub_enter, type = "samples", "jaccard", max.dist = 0.5))

Rplot03

Seems to be coherent with the output of the modified plot_net() function.

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggraph_2.0.4        gdtools_0.2.2       ggiraph_0.7.8       data.table_1.13.0   igraph_1.2.5        ggalt_0.4.0         plotly_4.9.2.1      textclean_0.9.3     ggrepel_0.8.2      
[10] cowplot_1.1.0       phyloseq_1.32.0     vegan_2.5-6         lattice_0.20-41     permute_0.9-5       kableExtra_1.3.1    dada2_1.16.0        Rcpp_1.0.5          forcats_0.5.0      
[19] stringr_1.4.0       dplyr_1.0.2         purrr_0.3.4         readr_1.3.1         tidyr_1.1.2         tibble_3.0.3        ggplot2_3.3.2       tidyverse_1.3.0     msa_1.20.1         
[28] Biostrings_2.56.0   XVector_0.28.0      IRanges_2.22.2      S4Vectors_0.26.1    BiocGenerics_0.34.0

loaded via a namespace (and not attached):
  [1] uuid_0.1-4                  readxl_1.3.1                backports_1.1.10            fastmatch_1.1-0             systemfonts_0.3.2           BiocFileCache_1.12.1       
  [7] plyr_1.8.6                  lazyeval_0.2.2              splines_4.0.2               BiocParallel_1.22.0         GenomeInfoDb_1.24.2         digest_0.6.25              
 [13] foreach_1.5.0               htmltools_0.5.0             viridis_0.5.1               fansi_0.4.1                 magrittr_1.5                memoise_1.1.0              
 [19] cluster_2.1.0               DECIPHER_2.16.1             graphlayouts_0.7.1          modelr_0.1.8                extrafont_0.17              RcppParallel_5.0.2         
 [25] matrixStats_0.57.0          extrafontdb_1.0             askpass_1.1                 prettyunits_1.1.1           jpeg_0.1-8.1                colorspace_1.4-1           
 [31] rvest_0.3.6                 blob_1.2.1                  rappdirs_0.3.1              haven_2.3.1                 xfun_0.18                   crayon_1.3.4               
 [37] RCurl_1.98-1.2              jsonlite_1.7.1              survival_3.1-12             phangorn_2.5.5              iterators_1.0.12            ape_5.4-1                  
 [43] glue_1.4.2                  polyclip_1.10-0             gtable_0.3.0                zlibbioc_1.34.0             webshot_0.5.2               DelayedArray_0.14.1        
 [49] proj4_1.0-10                Rhdf5lib_1.10.1             Rttf2pt1_1.3.8              biomartr_0.9.2              maps_3.3.0                  scales_1.1.1               
 [55] qdapRegex_0.7.2             DBI_1.1.0                   viridisLite_0.3.0           progress_1.2.2              bit_4.0.4                   htmlwidgets_1.5.2          
 [61] httr_1.4.2                  RColorBrewer_1.1-2          ellipsis_0.3.1              farver_2.0.3                pkgconfig_2.0.3             XML_3.99-0.5               
 [67] dbplyr_1.4.4                utf8_1.1.4                  labeling_0.3                tidyselect_1.1.0            rlang_0.4.7                 reshape2_1.4.4             
 [73] AnnotationDbi_1.50.3        munsell_0.5.0               cellranger_1.1.0            tools_4.0.2                 cli_2.0.2                   generics_0.0.2             
 [79] RSQLite_2.2.1               ade4_1.7-15                 broom_0.7.1                 evaluate_0.14               biomformat_1.16.0           yaml_2.2.1                 
 [85] fs_1.5.0                    knitr_1.30                  bit64_4.0.5                 tidygraph_1.2.0             nlme_3.1-148                ash_1.0-15                 
 [91] xml2_1.3.2                  biomaRt_2.44.1              compiler_4.0.2              rstudioapi_0.11             curl_4.3                    png_0.1-7                  
 [97] reprex_0.3.0                tweenr_1.0.1                stringi_1.5.3               Matrix_1.2-18               multtest_2.44.0             vctrs_0.3.4                
[103] pillar_1.4.6                lifecycle_0.2.0             bitops_1.0-6                GenomicRanges_1.40.0        R6_2.4.1                    latticeExtra_0.6-29        
[109] hwriter_1.3.2               ShortRead_1.46.0            gridExtra_2.3               KernSmooth_2.23-17          writexl_1.3.1               codetools_0.2-16           
[115] MASS_7.3-51.6               assertthat_0.2.1            rhdf5_2.32.4                SummarizedExperiment_1.18.2 openssl_1.4.3               withr_2.3.0                
[121] GenomicAlignments_1.24.0    Rsamtools_2.4.0             GenomeInfoDbData_1.2.3      mgcv_1.8-31                 hms_0.5.3                   quadprog_1.5-8             
[127] grid_4.0.2                  rmarkdown_2.4               ggforce_0.3.2               Biobase_2.48.0              lubridate_1.7.9             tinytex_0.26               

All the best,

Milan

EWisely commented 2 months ago

Thank you Milan! I was noticing a similar effect when using the plot_net function on my metabarcoding data, and your code fixed it!