Open ricardoV94 opened 1 year ago
I'm working on the ParetoNBD PR right now, and have added pytest
fixtures for CDNOW_sample.csv
and CDNOW_master.csv
which could potentially resolve this issue. These CSVs contain 2,357 and 23,570 rows respectively, and any tests using them should probably be marked as @pytest.mark.slow
.
My opinion on using CDNOW for testing has flip-flopped in recent weeks because even CDNOW_master
is much smaller than many datasets encountered in practice, but these are real-world benchmarks used in many research papers, and will be useful for testing against lifetimes MLE
convergence and reproducing research results.
The
setup_class
of TestBetaGeo is too slow as it performs MCMC sampling on a synthetic dataset. We should make this a fixture of tests that actually need it (if there's more than one), and possibly mark them with@pytest.mark.slow
so that they are not run locally by default.https://github.com/pymc-labs/pymc-marketing/blob/7ebc19465ad6daf62fbcb56d1b91fc40d09da7a0/tests/clv/models/test_beta_geo.py#L50
For other tests I have faked posterior draws by doing prior sampling with narrow priors around the "expected" values, and setting that as the posterior dataset. This allows us to check the summary methods work as expected without doing slow mcmc sampling.
CC @larryshamalama