Methods used in the article Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data.
Here we present several aspects of the microbiome data analysis, evaluating:
Data in HMP16SData
and curatedMetagenomicData
Bioconductor packages, respectively for 16S and WMS, are the microbiome data used in this analysis.
The directory _./goodness_offit/ contains the GOF.Rmd file which loads microbiome data, estimates several parametric models on the real datasets and evaluates the goodness of fit for each dataset.
The directory _./type_I_errorcontrol/ contains the TIEC.Rmd file which loads the same biological samples from the Human Microbiome Project (stool) for both 16S and WMS. Then, mock datasets, without differentially abundant features, are generated in order to compare differential abundance detection between methods.
For the power analysis, two folders are present: the one named enrichment and the other named power itself:
The directory ./enrichment/ contains the _real_data_enrichment16S.Rmd and _real_data_enrichmentWMS.Rmd files where a microbe set enrichment analysis is performed on the Supragingival vs Subgingival Plaque dataset.
The directory ./power/ contains several files:
Since the entire data production took a long time, the ./data/ directory contains several outputs from all the analyses. This should make it easier for the user to replicate the results.
To replicate the analyses it is strongly suggested to clone or download the entire github directory. Some of the functions used this paper are adapted from the work of: A broken promise: microbiome differential abundance methods do not control the false discovery rate., their original code is available at https://users.ugent.be/~shawinke/ABrokenPromise/index.html. The analyses run in many version of R during the development, R 3.5.1 was the final R version on which the methods worked. However it is fundamental to use specific versions for some CRAN or Bioconductor packages:
Here the sessionInfo()
:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] curatedMetagenomicData_1.12.3 bindrcpp_0.2.2
[3] ExperimentHub_1.8.0 AnnotationHub_2.14.2
[5] HMP16SData_1.2.0 ggdendro_0.1-20
[7] scales_1.0.0 ffpe_1.26.0
[9] TTR_0.23-4 vegan_2.5-3
[11] permute_0.9-4 ggpubr_0.2
[13] magrittr_1.5 dplyr_0.7.8
[15] mixOmics_6.6.1 MASS_7.3-50
[17] corncob_0.1.0 ALDEx2_1.14.1
[19] crayon_1.3.4 Seurat_2.3.4
[21] cowplot_0.9.4 ggplot2_3.1.0
[23] scde_1.99.1 flexmix_2.3-13
[25] lattice_0.20-35 MAST_1.8.2
[27] genefilter_1.64.0 AUC_0.3.0
[29] zinbwave_1.4.1 SingleCellExperiment_1.4.1
[31] ROCR_1.0-7 gplots_3.0.1
[33] reshape2_1.4.3 plyr_1.8.4
[35] phyloseq_1.26.1 metagenomeSeq_1.24.1
[37] RColorBrewer_1.1-2 glmnet_2.0-16
[39] foreach_1.4.4 Matrix_1.2-14
[41] DESeq2_1.22.2 SummarizedExperiment_1.12.0
[43] DelayedArray_0.8.0 BiocParallel_1.16.5
[45] matrixStats_0.54.0 Biobase_2.42.0
[47] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
[49] IRanges_2.16.0 S4Vectors_0.20.1
[51] BiocGenerics_0.28.0 edgeR_3.24.3
[53] limma_3.38.3
loaded via a namespace (and not attached):
[1] Hmisc_4.1-1 ica_1.0-2
[3] corpcor_1.6.9 class_7.3-14
[5] Rsamtools_1.34.0 lmtest_0.9-36
[7] nlme_3.1-137 backports_1.1.3
[9] ellipse_0.4.1 rlang_0.4.5
[11] XVector_0.22.0 readxl_1.2.0
[13] irlba_2.3.3 SparseM_1.77
[15] minfi_1.28.3 rjson_0.2.20
[17] bit64_0.9-7 glue_1.3.0
[19] trimcluster_0.1-2.1 rngtools_1.3.1
[21] sfsmisc_1.1-3 methylumi_2.28.0
[23] AnnotationDbi_1.44.0 haven_2.0.0
[25] tidyselect_0.2.5 rio_0.5.16
[27] fitdistrplus_1.0-14 XML_3.98-1.16
[29] nleqslv_3.3.2 tidyr_0.8.2
[31] zoo_1.8-4 GenomicAlignments_1.18.1
[33] xtable_1.8-3 lars_1.2
[35] MatrixModels_0.4-1 evaluate_0.12
[37] bibtex_0.4.2 Rdpack_0.10-1
[39] zlibbioc_1.28.0 rstudioapi_0.9.0
[41] doRNG_1.7.1 rpart_4.1-13
[43] shiny_1.2.0 xfun_0.4
[45] askpass_1.1 multtest_2.38.0
[47] cluster_2.0.7-1 caTools_1.17.1.1
[49] pcaMethods_1.74.0 doSNOW_1.0.16
[51] biomformat_1.10.1 interactiveDisplayBase_1.20.0
[53] tibble_2.0.1 quantreg_5.38
[55] base64_2.0 ape_5.2
[57] stabledist_0.7-1 Biostrings_2.50.2
[59] png_0.1-7 reshape_0.8.8
[61] withr_2.1.2 lumi_2.34.0
[63] bitops_1.0-6 cellranger_1.1.0
[65] pcaPP_1.9-73 pillar_1.3.1
[67] bumphunter_1.24.5 GenomicFeatures_1.34.1
[69] kernlab_0.9-27 hdf5r_1.0.1
[71] DelayedMatrixStats_1.4.0 xts_0.11-2
[73] metap_1.1 tools_3.5.1
[75] foreign_0.8-70 munsell_0.5.0
[77] distillery_1.0-4 proxy_0.4-22
[79] httpuv_1.4.5.1 compiler_3.5.1
[81] abind_1.4-5 rtracklayer_1.42.1
[83] extRemes_2.0-9 segmented_0.5-3.0
[85] beanplot_1.2 pkgmaker_0.27
[87] GenomeInfoDbData_1.2.0 gridExtra_2.3
[89] snow_0.4-3 later_0.7.5
[91] jsonlite_1.6 affy_1.60.0
[93] pbapply_1.4-0 carData_3.0-2
[95] lazyeval_0.2.1 promises_1.0.1
[97] car_3.0-2 latticeExtra_0.6-28
[99] R.utils_2.7.0 reticulate_1.10
[101] brew_1.0-6 checkmate_1.9.1
[103] rmarkdown_1.11 openxlsx_4.1.0
[105] nor1mix_1.2-3 rARPACK_0.11-0
[107] webshot_0.5.1 siggenes_1.56.0
[109] Rtsne_0.15 forcats_0.3.0
[111] copula_0.999-19 softImpute_1.4
[113] igraph_1.2.2 HDF5Array_1.10.1
[115] Rook_1.1-1 yaml_2.2.0
[117] survival_2.42-3 numDeriv_2016.8-1
[119] prabclus_2.2-7 htmltools_0.3.6
[121] memoise_1.1.0 modeltools_0.2-22
[123] locfit_1.5-9.1 quadprog_1.5-5
[125] viridisLite_0.3.0 digest_0.6.18
[127] assertthat_0.2.0 mime_0.6
[129] registry_0.5 npsurv_0.4-0
[131] RSQLite_2.1.1 lsei_1.2-0
[133] RcppArmadillo_0.9.200.7.0 data.table_1.12.0
[135] blob_1.1.1 R.oo_1.22.0
[137] preprocessCore_1.44.0 splines_3.5.1
[139] Formula_1.2-3 Rhdf5lib_1.4.2
[141] fpc_2.1-11.1 illuminaio_0.24.0
[143] Cairo_1.5-9 mixtools_1.1.0
[145] RCurl_1.95-4.11 hms_0.4.2
[147] rhdf5_2.26.2 colorspace_1.4-0
[149] base64enc_0.1-3 BiocManager_1.30.4
[151] SDMTools_1.1-221 nnet_7.3-12
[153] GEOquery_2.50.5 Rcpp_1.0.0
[155] ADGofTest_0.3 mclust_5.4.2
[157] RANN_2.6.1 mvtnorm_1.0-8
[159] pspline_1.0-18 R6_2.3.0
[161] grid_3.5.1 ggridges_0.5.1
[163] acepack_1.4.1 zip_1.0.0
[165] curl_3.3 gdata_2.18.0
[167] affyio_1.52.0 robustbase_0.93-3
[169] iterators_1.0.10 stringr_1.3.1
[171] htmlwidgets_1.3 biomaRt_2.38.0
[173] purrr_0.2.5 RMTstat_0.3
[175] rvest_0.3.2 mgcv_1.8-24
[177] openssl_1.2.1 htmlTable_1.13.1
[179] codetools_0.2-15 dtw_1.20-1
[181] Lmoments_1.2-3 gtools_3.8.1
[183] prettyunits_1.0.2 RSpectra_0.13-1
[185] R.methodsS3_1.7.1 gtable_0.2.0
[187] tsne_0.1-3 DBI_1.0.0
[189] httr_1.4.0 KernSmooth_2.23-15
[191] stringi_1.2.4 progress_1.2.0
[193] diptest_0.75-7 annotate_1.60.0
[195] xml2_1.2.0 kableExtra_1.0.1
[197] ade4_1.7-13 readr_1.3.1
[199] geneplotter_1.60.0 DEoptimR_1.0-8
[201] bit_1.1-14 pkgconfig_2.0.2
[203] gsl_1.9-10.3 gbRd_0.4-11
[205] bindr_0.1.1 knitr_1.21