Improve the performance of linear model inferences

danielkberry commented 1 month ago

This commit adds 3 performance improvements to linear model inferences that reduce the memory requirements and improve the speed:

For relative inferences, instead of applying the delta method on the original data (which would require allocating several new dataframes of similar size), we create a smaller grid that approximates the empirical distribution
Switch to fitting the linear model by solving the normal equations. This method avoids allocating several intermediate matrices (that are the same size as the design matrix) that the built-in solvers require.
Switch from Patsy to Formulaic for building the design matrix.

With these results, Jetstream will run quickly and without running out of memory for experiments of up to 1e7 users. Experiments of larger than that (or with many branches) may still fail. As a result, I recommend increasing the memory available to each Jetstream process (by reducing the JETSTREAM_PROCESSES environment variable).

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 86.79245% with 14 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@19d673f). Learn more about missing BASE report.

Files	Patch %	Lines
...lysis/frequentist_stats/linear_models/functions.py	79.71%	14 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #250 +/- ## ======================================= Coverage ? 83.04% ======================================= Files ? 18 Lines ? 1268 Branches ? 0 ======================================= Hits ? 1053 Misses ? 215 Partials ? 0 ``` | [Flag](https://app.codecov.io/gh/mozilla/mozanalysis/pull/250/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mozilla) | Coverage Δ | | |---|---|---| | [project](https://app.codecov.io/gh/mozilla/mozanalysis/pull/250/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mozilla) | `83.04% <86.79%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=mozilla#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

m-d-bowerman commented 1 week ago

New tests look good, too

mozilla / mozanalysis

Improve the performance of linear model inferences #250

Codecov Report