Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs

brookslogan commented 3 years ago

This PR builds on #6. It provides two extra efficiency gains for quantile_ensemble_flex:

Speed up model$A formation even further by building up parts of i, j, and x vectors to feed into sparseMatrix beforehand, then calling sparseMatrix only once. This avoids extra checks and conversions between Csparse and Tsparse matrices. An Rsparse-inspired approach might be marginally faster, but harder to read.
Improve performance when tau_groups contains runs (consecutive duplicated values): eliminates "duplicate" optimization variables, and when nonneg is TRUE and q0 doesn't contain crossings, eliminates noncrossing constraints within these runs that appear unnecessary. It may be possible to have a competitive or faster alternative to the current quantile_ensemble_stand by delegating to / postprocessing this version of quantile_ensemble_flex.

Additionally, this adds some characterization tests for quantile_ensemble using testthat.

Exploratory testing & benchmarking code:

Selected benchmark results for a 400 x 20 x 7 q matrix with three tau groups, no crossings within qarr, and default parameters:

Unit: milliseconds
                            expr       min        lq      mean    median
                        prealloc 2541.4974 2548.8273 2562.0451 2555.2931
                           rbind  863.5922  866.2385  875.5932  870.2080
         rbind_listTsparse_parts  186.2303  189.6228  197.0454  191.7732
 rbind_listTsparse_parts_rle_tau  143.2163  145.7374  147.7879  148.1336
          rbind_Rsparse_pairwise  191.3534  194.3831  195.8514  195.7886
      rbind_listRsparse_pairwise  183.8296  185.3404  187.0080  186.4660
        rbind_listRsparse_blocks  181.9815  183.4725  196.5011  184.8695
         rbind_modRsparse_blocks  181.3435  183.5542  184.4445  184.2028
             Rsparse_gurobi_only  170.1477  172.4180  173.4431  173.0580
        uq       max neval
 2563.6260 2657.1050    20
  873.0159  979.1753    20
  193.5336  304.1660    20
  149.4263  152.1581    20
  197.3839  199.1315    20
  188.7764  191.4906    20
  186.8797  300.5560    20
  185.4671  188.5675    20
  174.2836  176.3394    20

ryantibs commented 3 years ago

@lcbrooks This looks wonderful. Great job and thank you!

One thing: please add yourself as a package author!

Another thing: I forget what's the easiest way for me to see the results of the tests. I can see the code you've added but can you remind what's the easiest way for me to look at the test outputs (the checks that the old code and the new code match up)? Thanks.

brookslogan commented 3 years ago

Thanks!

I updated the DESCRIPTION above (is "aut" the right "role"?).

For running tests, use devtools::test(), Ctrl/Cmd-Shift-T in RStudio, or C-c C-w C-t in Emacs with ESS. Ideally, GitHub Actions would run these tests on everything automatically as well, but I'm haven't worked with these before... maybe usethis::use_github_action_check_standard() or usethis::use_github_action_check_full() would take care of the setup?

ryantibs commented 3 years ago

Yep "aut" is the right role.

I just ran devtools::test() and get:

> devtools::test()
Loading quantgen
Testing quantgen
✔ |  OK F W S | Context
✖ |   2 1     | characterization-quantile_ensemble [0.5 s]                      
────────────────────────────────────────────────────────────────────────────────
Error (test-characterization-quantile_ensemble.R:77:3): quantile_ensemble's A matrix (for Rgplk) encodes identical entries to previous version and that the output is equal
Error: unused argument (repr = "T")
Backtrace:
 1. withr::with_rng_version(...) test-characterization-quantile_ensemble.R:77:2
 4. quantgen::quantile_ensemble(...) test-characterization-quantile_ensemble.R:138:10
 5. quantgen::quantile_ensemble_flex(...) /Users/ryantibshirani/Dropbox/quantgen/R-package/quantgen/R/quantile_ensemble.R:134:4
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 0.6 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 2 ]

I'm not sure if the only error is due to the last commit (use of repr = "T" ... by the way did you really mean to put quotations as in "T" here? Just checking because T, without quotations, is encoded as TRUE in R.). Please fix the errors? Thanks!

brookslogan commented 3 years ago

Thanks for catching this! It should be fixed now. See the commit message here / above for context. (I did mean "T" rather than T, but trying to use the argument repr at all broke things when using Matrix package versions prior to 1.3-0.)

ryantibs commented 3 years ago

Sorry for the delay here. I'm happy to merge this now (reran tests and all 87 passed!).

@elray1 You might consider running this for your use case now and see how fast it is? (No pressure either way, but just curious to see if it's competitive with your Python code now.)

ryantibs / quantgen

Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8