ryantibs / quantgen

Tools for generalized quantile modeling
https://ryantibs.github.io/quantgen
14 stars 9 forks source link

Change behavior to only center features if `intercept=TRUE`. #22

Closed huisaddison closed 1 year ago

huisaddison commented 1 year ago

Previous behavior was to center features if standardize=TRUE, regardless of intercept. Doing so introduces an "implicit intercept" when the features are returned to their original scale. This is contrary to user expectations when setting intercept=FALSE.

Fixes #17.

huisaddison commented 1 year ago

@ryantibs I updated vignettes/quantgen.Rmd but I'm not sure if there's a separate build process for the vignette to update the website itself. (The vignette only has a minor change to clarify behavior under different settings of intercept,standardize. Once the vignette has been built (if necessary) I can change this from a draft PR to a PR.

ryantibs commented 1 year ago

Thanks! It can be built by running

pkgdown::build_site()

while in the base directory for the R package. It should rebuild all the stuff in the docs subdirectory. Do you want to give it a shot?

huisaddison commented 1 year ago

Ready to merge.

Re-building the site files caused a lot of spurious diffs, so look at the diff for this commit to see actual code changes: https://github.com/ryantibs/quantgen/pull/22/commits/c980ef2f2fd5651497081bc21fc25ba47627ec2c

I did run devtools::test():

[ FAIL 42 | WARN 0 | SKIP 0 | PASS 45 ]

The failures seem to all come from tests for the ensembling code and appear to be due to mismatch in floating point precision, e.g.,

Failure (test-characterization-quantile_ensemble.R:77:3): quantile_ensemble's A matrix (for Rgplk) encodes identical entries to previous version and that the output is equal
`object` (`actual`) not identical to readRDS("_custom_snaps/trial_config_40111d_glpk_inputs.RDS")[["object"]] (`expected`).

     actual[[4]]           | expected[[4]]
 [1] 0.20926536612213931   | 0.20926536612213931   [1]
 [2] -0.34812838665997897  - -0.34812838665997903  [2]
 [3] 0.04508776127008601   | 0.04508776127008601   [3]
 [4] -0.17575536614454748  | -0.17575536614454748  [4]
 [5] 2.09265366122139307   | 2.09265366122139307   [5]
 [6] -3.48128386659978961  - -3.48128386659979050  [6]
 [7] 0.45087761270086013   | 0.45087761270086013   [7]
 [8] -1.75755366144547498  | -1.75755366144547498  [8]
 [9] 10.46326830610696490  | 10.46326830610696490  [9]
[10] -17.40641933299894717 - -17.40641933299895072 [10]
 ... ...                     ...                   and 31 more ...
Backtrace:
 1. withr::with_rng_version(...)
      at test-characterization-quantile_ensemble.R:77:2
 4. quantgen (local) expect_identical_custom_rds_snapshot(...)
      at test-characterization-quantile_ensemble.R:168:6
 5. testthat::expect_identical(object, readRDS(!!file)[["object"]])

but wanted to point them out in case the failures were not expected.

ryantibs commented 1 year ago

Thanks. I believe @lcbrooks wrote these tests. Logan, can you please take a look?

huisaddison commented 1 year ago

@lcbrooks In case it's helpful, here's the full output of the test suite: https://gist.github.com/huisaddison/06429942aba7cb3a669c722307d210b7

brookslogan commented 1 year ago

Thanks; I'm still having a little trouble reproducing on

my Fedora setup

R version 4.1.3 (2022-03-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 36 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.2

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quantgen_1.0.0 testthat_3.1.4

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        lattice_0.20-45   prettyunits_1.1.1 ps_1.7.1         
 [5] rprojroot_2.0.3   digest_0.6.29     utf8_1.2.2        mime_0.12        
 [9] slam_0.1-50       R6_2.5.1          evaluate_0.15     pillar_1.8.0     
[13] rlang_1.0.4       rstudioapi_0.13   praise_1.0.0      miniUI_0.1.1.1   
[17] callr_3.7.1       urlchecker_1.0.1  Matrix_1.4-1      desc_1.4.1       
[21] devtools_2.4.4    stringr_1.4.0     htmlwidgets_1.5.4 shiny_1.7.2      
[25] compiler_4.1.3    httpuv_1.6.5      xfun_0.31         pkgconfig_2.0.3  
[29] pkgbuild_1.3.1    Rglpk_0.6-4       htmltools_0.5.3   tibble_3.1.8     
[33] roxygen2_7.2.1    fansi_1.0.3       crayon_1.5.1      withr_2.5.0      
[37] later_1.3.0       brio_1.1.3        waldo_0.4.0       grid_4.1.3       
[41] xtable_1.8-4      lifecycle_1.0.1   magrittr_2.0.3    cli_3.3.0        
[45] stringi_1.7.8     cachem_1.0.6      fs_1.5.2          promises_1.2.0.1 
[49] remotes_2.4.2     mockery_0.4.3     xml2_1.3.3        ellipsis_0.3.2   
[53] vctrs_0.4.1       tools_4.1.3       glue_1.6.2        purrr_0.3.4      
[57] processx_3.7.0    pkgload_1.3.0     fastmap_1.1.0     sessioninfo_1.2.2
[61] memoise_2.0.1     knitr_1.39        profvis_0.3.7     usethis_2.1.6    

or on
rocker/verse:4.1

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quantgen_1.0.0 testthat_3.1.3

loaded via a namespace (and not attached):
 [1] compiler_4.1.3    pillar_1.7.0      prettyunits_1.1.1 remotes_2.4.2    
 [5] tools_4.1.3       pkgbuild_1.3.1    pkgload_1.2.4     evaluate_0.15    
 [9] memoise_2.0.1     lifecycle_1.0.1   tibble_3.1.6      lattice_0.20-45  
[13] pkgconfig_2.0.3   rlang_1.0.2       Matrix_1.4-0      cli_3.2.0        
[17] rstudioapi_0.13   Rglpk_0.6-4       fastmap_1.1.0     withr_2.5.0      
[21] desc_1.4.1        fs_1.5.2          vctrs_0.4.1       devtools_2.4.3   
[25] rprojroot_2.0.3   grid_4.1.3        glue_1.6.2        R6_2.5.1         
[29] processx_3.5.3    mockery_0.4.3     fansi_1.0.3       waldo_0.4.0      
[33] sessioninfo_1.2.2 callr_3.7.0       purrr_0.3.4       magrittr_2.0.3   
[37] ps_1.6.0          ellipsis_0.3.2    usethis_2.1.5     utf8_1.2.2       
[41] slam_0.1-50       cachem_1.0.6      crayon_1.5.1      brio_1.1.3       

when run off of this branch (commit 22d1be7f1f625ce026ebb8909e085345e94cae37). Is this the right commit?

Some suspects:

Could you compare/share your sessionInfo() and the problematic commit hash?

If it's the version differences that are the problem, then to make these snapshot tests reproducible, we might need to use an renv here (thanks for pointing out this package earlier, Addison!); using GitHub actions to run the tests might also help with consistency. (Aside: to avoid having to manually build the vignettes and site locally or store them in the development branches in epiprocess and epipredict, we've been using a gh-pages setup. This helps speed&convenients&consistency, but does require users that install_github to build the vignettes themselves.)

brookslogan commented 1 year ago

Another question: did you run the test with devtools::test() or by running the file manually? This may impact the testthat edition used.

Aside from my problems reproducing:

huisaddison commented 1 year ago

Here's the output from sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin21.6.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.21/lib/libopenblasp-r0.3.21.dylib
LAPACK: /opt/homebrew/Cellar/r/4.2.1_4/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] quantgen_1.0.0 testthat_3.1.4

loaded via a namespace (and not attached):
 [1] slam_0.1-50       remotes_2.4.2     purrr_0.3.4       rematch2_2.1.2
 [5] diffobj_0.3.5     lattice_0.20-45   vctrs_0.4.1       miniUI_0.1.1.1
 [9] usethis_2.1.6     htmltools_0.5.3   utf8_1.2.2        rlang_1.0.5
[13] pkgbuild_1.3.1    urlchecker_1.0.1  later_1.3.0       pillar_1.8.1
[17] glue_1.6.2        withr_2.5.0       waldo_0.4.0       mockery_0.4.3
[21] sessioninfo_1.2.2 lifecycle_1.0.2   stringr_1.4.1     devtools_2.4.4
[25] htmlwidgets_1.5.4 memoise_2.0.1     evaluate_0.16     callr_3.7.2
[29] fastmap_1.1.0     httpuv_1.6.6      ps_1.7.1          fansi_1.0.3
[33] Rcpp_1.0.9        xtable_1.8-4      promises_1.2.0.1  cachem_1.0.6
[37] desc_1.4.2        pkgload_1.3.0     mime_0.12         fs_1.5.2
[41] brio_1.1.3        digest_0.6.29     stringi_1.7.8     processx_3.7.0
[45] shiny_1.7.2       rprojroot_2.0.3   grid_4.2.1        Rglpk_0.6-4
[49] cli_3.4.0         tools_4.2.1       magrittr_2.0.3    tibble_3.1.8
[53] profvis_0.3.7     crayon_1.5.1      pkgconfig_2.0.3   ellipsis_0.3.2
[57] Matrix_1.4-1      prettyunits_1.1.1 rstudioapi_0.14   R6_2.5.1
[61] compiler_4.2.1

Some other notes that may be relevant:

@lcbrooks if it's easier for us to just schedule a brief Zoom for us to screenshare/resolve this quickly, feel free to Slack me -- I'm free for most of today/tomorrow.

huisaddison commented 1 year ago

Summary from meeting with Logan:

ryantibs commented 1 year ago

Ok will merge! Thank you both