pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
683 stars 190 forks source link

`.build_model()` Inconsistencies #318

Closed ColtAllen closed 7 months ago

ColtAllen commented 1 year ago

.build_model() is supposedly a required step between instantiating a clv model and fitting it. However, models can still be fit if this is skipped, and from what I can tell, the results are identical. The below example is for BetaGeoModel, but it's the same deal with ParetoNBDModel:

import pandas as pd
from pymc_marketing.clv.models import BetaGeoModel

# load data
url_cdnow_rfm = "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/datasets/clv_quickstart.csv"
df = pd.read_csv(url_cdnow_rfm)
df['customer_id'] = df.index

# init and fit model
bgnbd = BetaGeoModel(df)
#bgnbd.build_model()
bgnbd.fit()

# this will raise an error if build_model() was not called
repr(bgnbd)

The only difference is the missing __repr__ attributes.

Moving the .build_model() call inside the fit() method seems like an easy fix for this. Even so, how's it possible to fit a model that was never defined to begin with?

michaelraczycki commented 1 year ago

fit() calls build model under the hood, and as far as I'm concerned there isn't any suggestion stating that it needs to be called before fitting the model to the data.

michaelraczycki commented 1 year ago

in pareto fit() you're calling super().fit() and in line 47 if basic.py there's:

self.build_model()

BBDS-Colt commented 1 year ago

fit() calls build model under the hood, and as far as I'm concerned there isn't any suggestion stating that it needs to be called before fitting the model to the data.

Why isn't the repr method working properly if build_model is already being called under the hood?

michaelraczycki commented 1 year ago

I'm not familiar with your specific implementation, it might have some differences. All other models implement a separate test for repr being used, so it might be a good idea to implement it in Pareto as well, and worst case scenario we'll get some insight on what's failing

BBDS-Colt commented 1 year ago

(@ColtAllen) here. There is already a passing repr test for ParetoNBDModel, but this is also happening for BetaGeoModel. Omitting the call to .build_model() is an untested edge case for all models.

ColtAllen commented 1 year ago

Calling self.build_model() within the subclass __init__ is an easy fix for this, and would make any calls within self.fit() redundant. I'm working on a PR now that does the latter for ParetoNBDModel.

ricardoV94 commented 1 year ago

This is an artifact of the ModelBuilder API, where you don't specify the data until you call .fit, and therefore can't safely create the model in advance