pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
614 stars 148 forks source link

TestBetaGeo `setup_class` is too slow #172

Open ricardoV94 opened 1 year ago

ricardoV94 commented 1 year ago

The setup_class of TestBetaGeo is too slow as it performs MCMC sampling on a synthetic dataset. We should make this a fixture of tests that actually need it (if there's more than one), and possibly mark them with @pytest.mark.slow so that they are not run locally by default.

https://github.com/pymc-labs/pymc-marketing/blob/7ebc19465ad6daf62fbcb56d1b91fc40d09da7a0/tests/clv/models/test_beta_geo.py#L50

For other tests I have faked posterior draws by doing prior sampling with narrow priors around the "expected" values, and setting that as the posterior dataset. This allows us to check the summary methods work as expected without doing slow mcmc sampling.

CC @larryshamalama

ColtAllen commented 1 year ago

I'm working on the ParetoNBD PR right now, and have added pytest fixtures for CDNOW_sample.csv and CDNOW_master.csv which could potentially resolve this issue. These CSVs contain 2,357 and 23,570 rows respectively, and any tests using them should probably be marked as @pytest.mark.slow.

My opinion on using CDNOW for testing has flip-flopped in recent weeks because even CDNOW_master is much smaller than many datasets encountered in practice, but these are real-world benchmarks used in many research papers, and will be useful for testing against lifetimes MLE convergence and reproducing research results.