statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
103 stars 22 forks source link

differentialTest seems broken #87

Closed cdiener closed 4 years ago

cdiener commented 4 years ago

Hi,

recently differentialTest always fails for me and wrongly reports that the optimization can not converge. However, the same optimization will work in a bbdml call.

> library(corncob)
> genera <- tax_glom(soil_phylo, "Genus")
> differentialTest(~ 1, ~ 1, ~ 1, ~ 1, data=genera, method="LRT")  # stupid model to illustrate the point
Error in differentialTest(~1, ~1, ~1, ~1, data = genera, method = "LRT") : 
  All models failed to converge! 

           If you are seeing this, it is likely that your model is overspecified. This occurs when your sample size is not large enough to estimate all the parameters of your model. This is most commonly due to categorical variables that include many categories. To confirm this, try running a model for a single taxon with bbdml.
> bbdml(reformulate("1", taxa_names(genera)[1]), ~ 1, data=genera)

Call:
bbdml(formula = reformulate("1", taxa_names(genera)[1]), phi.formula = ~1, 
    data = genera)

Coefficients associated with abundance:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.56278    0.02675  -95.79   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients associated with dispersion:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -5.1821     0.1318  -39.32   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-likelihood: -855.27
> packageVersion("corncob")
[1] ‘0.1.0’

Session info:

``` R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux bullseye/sid Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phyloseq_1.32.0 corncob_0.1.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.2 [4] plyr_1.8.6 XVector_0.28.0 iterators_1.0.12 [7] tools_4.0.2 zlibbioc_1.34.0 tibble_3.0.3 [10] jsonlite_1.7.0 lifecycle_0.2.0 nlme_3.1-148 [13] rhdf5_2.32.2 gtable_0.3.0 lattice_0.20-41 [16] mgcv_1.8-31 pkgconfig_2.0.3 rlang_0.4.7 [19] igraph_1.2.5 Matrix_1.2-18 foreach_1.5.0 [22] parallel_4.0.2 dplyr_1.0.2 stringr_1.4.0 [25] cluster_2.1.0 generics_0.0.2 vctrs_0.3.4 [28] Biostrings_2.56.0 S4Vectors_0.26.1 IRanges_2.22.2 [31] trust_0.1-8 multtest_2.44.0 tidyselect_1.1.0 [34] stats4_4.0.2 ade4_1.7-15 grid_4.0.2 [37] glue_1.4.2 Biobase_2.48.0 data.table_1.13.0 [40] R6_2.4.1 survival_3.2-3 VGAM_1.1-3 [43] purrr_0.3.4 reshape2_1.4.4 Rhdf5lib_1.10.1 [46] ggplot2_3.3.2 magrittr_1.5 splines_4.0.2 [49] ellipsis_0.3.1 scales_1.1.1 codetools_0.2-16 [52] MASS_7.3-51.6 BiocGenerics_0.34.0 biomformat_1.16.0 [55] permute_0.9-5 ape_5.4 colorspace_1.4-1 [58] stringi_1.4.6 munsell_0.5.0 vegan_2.5-6 [61] crayon_1.3.4 ```
bryandmartin commented 4 years ago

Hmm... this might just be an issue with a bad error message. I'll need to check this, but my guess is that it shows that error (and shouldn't) when you run a LRT where the two models are identical. Is that the case in your actual use case as well?

cdiener commented 4 years ago

It works as expected when using bbdml and not differentialTest.

> mod <- bbdml(reformulate("1", taxa_names(genera)[1]), ~ 1, data=genera)
> lrtest(mod, mod)
[1] 1

Also this was just for illustration. This also happens with sensible formulas and any data set and design formulae I have tried:

> differentialTest(~ DayAmdmt, ~ 1, ~ 1, ~ 1, data=genera, method="LRT")
Error in differentialTest(~DayAmdmt, ~1, ~1, ~1, data = genera, method = "LRT") : 
  All models failed to converge! 

           If you are seeing this, it is likely that your model is overspecified. This occurs when your sample size is not large enough to estimate all the parameters of your model. This is most commonly due to categorical variables that include many categories. To confirm this, try running a model for a single taxon with bbdml.
bryandmartin commented 4 years ago

@cdiener Thanks, this might require some more specific debugging to figure out what the issue is, because LRT still seems to be working with my testing data sets. Is there a dataset you would be willing to share with me via email at bmartin6@uw.edu so that I can investigate the issue? If not I'll still work on this but it may take a bit longer to pin down.

cdiener commented 4 years ago

The example above is using the the data set that is included in corncob (soil_phylo) summarized on the genus level (genera <- tax_glom(soil_phylo, "Genus")). I ran the commands shown above on two different machines with different R (3.6 vs. 4.0.2) and Bioconductor versions (3.10 vs. 3.11) and it fails on both.

To make it easier here is the standalone code to reproduce the issue:

library(corncob)
genera <- tax_glom(soil_phylo, "Genus")
differentialTest(~ DayAmdmt, ~ 1, ~ 1, ~ 1, data=genera, method="LRT")
bryandmartin commented 4 years ago

Ah thanks @cdiener , I see the issue now! method is a rarely-needed option to specify the optimization algorithm you would like to use. Replace method = "LRT" with test = "LRT" and it should fix your issue!

That being said, this should absolutely be a more informative error message, so I will leave this issue open until I fix that.

cdiener commented 4 years ago

Oh shoot. My bad. Thanks!

bryandmartin commented 4 years ago

No worries! In all seriousness, this is exactly the type of issue that helps me know where I can make the package more user-friendly, so I appreciate it!