mcalgaro93 / sc2meta

Methods used in the article "Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data"
MIT License
10 stars 7 forks source link

sc2meta

Methods used in the article Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data.

DOI

Here we present several aspects of the microbiome data analysis, evaluating:

Data in HMP16SData and curatedMetagenomicData Bioconductor packages, respectively for 16S and WMS, are the microbiome data used in this analysis.

Goodness of Fit (GOF) evaluation

The directory _./goodness_offit/ contains the GOF.Rmd file which loads microbiome data, estimates several parametric models on the real datasets and evaluates the goodness of fit for each dataset.

Type I Error Control

The directory _./type_I_errorcontrol/ contains the TIEC.Rmd file which loads the same biological samples from the Human Microbiome Project (stool) for both 16S and WMS. Then, mock datasets, without differentially abundant features, are generated in order to compare differential abundance detection between methods.

Power

For the power analysis, two folders are present: the one named enrichment and the other named power itself:

Enrichment

The directory ./enrichment/ contains the _real_data_enrichment16S.Rmd and _real_data_enrichmentWMS.Rmd files where a microbe set enrichment analysis is performed on the Supragingival vs Subgingival Plaque dataset.

Power

The directory ./power/ contains several files:

Data

Since the entire data production took a long time, the ./data/ directory contains several outputs from all the analyses. This should make it easier for the user to replicate the results.

Instructions and R environment

To replicate the analyses it is strongly suggested to clone or download the entire github directory. Some of the functions used this paper are adapted from the work of: A broken promise: microbiome differential abundance methods do not control the false discovery rate., their original code is available at https://users.ugent.be/~shawinke/ABrokenPromise/index.html. The analyses run in many version of R during the development, R 3.5.1 was the final R version on which the methods worked. However it is fundamental to use specific versions for some CRAN or Bioconductor packages:

Here the sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] curatedMetagenomicData_1.12.3 bindrcpp_0.2.2               
 [3] ExperimentHub_1.8.0           AnnotationHub_2.14.2         
 [5] HMP16SData_1.2.0              ggdendro_0.1-20              
 [7] scales_1.0.0                  ffpe_1.26.0                  
 [9] TTR_0.23-4                    vegan_2.5-3                  
[11] permute_0.9-4                 ggpubr_0.2                   
[13] magrittr_1.5                  dplyr_0.7.8                  
[15] mixOmics_6.6.1                MASS_7.3-50                  
[17] corncob_0.1.0                 ALDEx2_1.14.1                
[19] crayon_1.3.4                  Seurat_2.3.4                 
[21] cowplot_0.9.4                 ggplot2_3.1.0                
[23] scde_1.99.1                   flexmix_2.3-13               
[25] lattice_0.20-35               MAST_1.8.2                   
[27] genefilter_1.64.0             AUC_0.3.0                    
[29] zinbwave_1.4.1                SingleCellExperiment_1.4.1   
[31] ROCR_1.0-7                    gplots_3.0.1                 
[33] reshape2_1.4.3                plyr_1.8.4                   
[35] phyloseq_1.26.1               metagenomeSeq_1.24.1         
[37] RColorBrewer_1.1-2            glmnet_2.0-16                
[39] foreach_1.4.4                 Matrix_1.2-14                
[41] DESeq2_1.22.2                 SummarizedExperiment_1.12.0  
[43] DelayedArray_0.8.0            BiocParallel_1.16.5          
[45] matrixStats_0.54.0            Biobase_2.42.0               
[47] GenomicRanges_1.34.0          GenomeInfoDb_1.18.1          
[49] IRanges_2.16.0                S4Vectors_0.20.1             
[51] BiocGenerics_0.28.0           edgeR_3.24.3                 
[53] limma_3.38.3                 

loaded via a namespace (and not attached):
  [1] Hmisc_4.1-1                   ica_1.0-2                    
  [3] corpcor_1.6.9                 class_7.3-14                 
  [5] Rsamtools_1.34.0              lmtest_0.9-36                
  [7] nlme_3.1-137                  backports_1.1.3              
  [9] ellipse_0.4.1                 rlang_0.4.5                  
 [11] XVector_0.22.0                readxl_1.2.0                 
 [13] irlba_2.3.3                   SparseM_1.77                 
 [15] minfi_1.28.3                  rjson_0.2.20                 
 [17] bit64_0.9-7                   glue_1.3.0                   
 [19] trimcluster_0.1-2.1           rngtools_1.3.1               
 [21] sfsmisc_1.1-3                 methylumi_2.28.0             
 [23] AnnotationDbi_1.44.0          haven_2.0.0                  
 [25] tidyselect_0.2.5              rio_0.5.16                   
 [27] fitdistrplus_1.0-14           XML_3.98-1.16                
 [29] nleqslv_3.3.2                 tidyr_0.8.2                  
 [31] zoo_1.8-4                     GenomicAlignments_1.18.1     
 [33] xtable_1.8-3                  lars_1.2                     
 [35] MatrixModels_0.4-1            evaluate_0.12                
 [37] bibtex_0.4.2                  Rdpack_0.10-1                
 [39] zlibbioc_1.28.0               rstudioapi_0.9.0             
 [41] doRNG_1.7.1                   rpart_4.1-13                 
 [43] shiny_1.2.0                   xfun_0.4                     
 [45] askpass_1.1                   multtest_2.38.0              
 [47] cluster_2.0.7-1               caTools_1.17.1.1             
 [49] pcaMethods_1.74.0             doSNOW_1.0.16                
 [51] biomformat_1.10.1             interactiveDisplayBase_1.20.0
 [53] tibble_2.0.1                  quantreg_5.38                
 [55] base64_2.0                    ape_5.2                      
 [57] stabledist_0.7-1              Biostrings_2.50.2            
 [59] png_0.1-7                     reshape_0.8.8                
 [61] withr_2.1.2                   lumi_2.34.0                  
 [63] bitops_1.0-6                  cellranger_1.1.0             
 [65] pcaPP_1.9-73                  pillar_1.3.1                 
 [67] bumphunter_1.24.5             GenomicFeatures_1.34.1       
 [69] kernlab_0.9-27                hdf5r_1.0.1                  
 [71] DelayedMatrixStats_1.4.0      xts_0.11-2                   
 [73] metap_1.1                     tools_3.5.1                  
 [75] foreign_0.8-70                munsell_0.5.0                
 [77] distillery_1.0-4              proxy_0.4-22                 
 [79] httpuv_1.4.5.1                compiler_3.5.1               
 [81] abind_1.4-5                   rtracklayer_1.42.1           
 [83] extRemes_2.0-9                segmented_0.5-3.0            
 [85] beanplot_1.2                  pkgmaker_0.27                
 [87] GenomeInfoDbData_1.2.0        gridExtra_2.3                
 [89] snow_0.4-3                    later_0.7.5                  
 [91] jsonlite_1.6                  affy_1.60.0                  
 [93] pbapply_1.4-0                 carData_3.0-2                
 [95] lazyeval_0.2.1                promises_1.0.1               
 [97] car_3.0-2                     latticeExtra_0.6-28          
 [99] R.utils_2.7.0                 reticulate_1.10              
[101] brew_1.0-6                    checkmate_1.9.1              
[103] rmarkdown_1.11                openxlsx_4.1.0               
[105] nor1mix_1.2-3                 rARPACK_0.11-0               
[107] webshot_0.5.1                 siggenes_1.56.0              
[109] Rtsne_0.15                    forcats_0.3.0                
[111] copula_0.999-19               softImpute_1.4               
[113] igraph_1.2.2                  HDF5Array_1.10.1             
[115] Rook_1.1-1                    yaml_2.2.0                   
[117] survival_2.42-3               numDeriv_2016.8-1            
[119] prabclus_2.2-7                htmltools_0.3.6              
[121] memoise_1.1.0                 modeltools_0.2-22            
[123] locfit_1.5-9.1                quadprog_1.5-5               
[125] viridisLite_0.3.0             digest_0.6.18                
[127] assertthat_0.2.0              mime_0.6                     
[129] registry_0.5                  npsurv_0.4-0                 
[131] RSQLite_2.1.1                 lsei_1.2-0                   
[133] RcppArmadillo_0.9.200.7.0     data.table_1.12.0            
[135] blob_1.1.1                    R.oo_1.22.0                  
[137] preprocessCore_1.44.0         splines_3.5.1                
[139] Formula_1.2-3                 Rhdf5lib_1.4.2               
[141] fpc_2.1-11.1                  illuminaio_0.24.0            
[143] Cairo_1.5-9                   mixtools_1.1.0               
[145] RCurl_1.95-4.11               hms_0.4.2                    
[147] rhdf5_2.26.2                  colorspace_1.4-0             
[149] base64enc_0.1-3               BiocManager_1.30.4           
[151] SDMTools_1.1-221              nnet_7.3-12                  
[153] GEOquery_2.50.5               Rcpp_1.0.0                   
[155] ADGofTest_0.3                 mclust_5.4.2                 
[157] RANN_2.6.1                    mvtnorm_1.0-8                
[159] pspline_1.0-18                R6_2.3.0                     
[161] grid_3.5.1                    ggridges_0.5.1               
[163] acepack_1.4.1                 zip_1.0.0                    
[165] curl_3.3                      gdata_2.18.0                 
[167] affyio_1.52.0                 robustbase_0.93-3            
[169] iterators_1.0.10              stringr_1.3.1                
[171] htmlwidgets_1.3               biomaRt_2.38.0               
[173] purrr_0.2.5                   RMTstat_0.3                  
[175] rvest_0.3.2                   mgcv_1.8-24                  
[177] openssl_1.2.1                 htmlTable_1.13.1             
[179] codetools_0.2-15              dtw_1.20-1                   
[181] Lmoments_1.2-3                gtools_3.8.1                 
[183] prettyunits_1.0.2             RSpectra_0.13-1              
[185] R.methodsS3_1.7.1             gtable_0.2.0                 
[187] tsne_0.1-3                    DBI_1.0.0                    
[189] httr_1.4.0                    KernSmooth_2.23-15           
[191] stringi_1.2.4                 progress_1.2.0               
[193] diptest_0.75-7                annotate_1.60.0              
[195] xml2_1.2.0                    kableExtra_1.0.1             
[197] ade4_1.7-13                   readr_1.3.1                  
[199] geneplotter_1.60.0            DEoptimR_1.0-8               
[201] bit_1.1-14                    pkgconfig_2.0.2              
[203] gsl_1.9-10.3                  gbRd_0.4-11                  
[205] bindr_0.1.1                   knitr_1.21