pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
683 stars 190 forks source link

CLV Quickstart example fails #435

Closed animenon closed 10 months ago

animenon commented 11 months ago

CLV Quickstart example fails at the function call: beta_geo_model = clv.BetaGeoModel(data = data)

Not sure what I am missing here, I am on a Mac M1 and using conda to run the code from ipython.

On a side note, why doesn't the package just have a pip installable version? I am not a conda user so to just checkout the package I had to use conda.

animenon commented 11 months ago

Error I see:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/anaconda3-2023.09-0/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File ~/.pyenv/versions/anaconda3-2023.09-0/lib/python3.11/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File ~/.pyenv/versions/anaconda3-2023.09-0/lib/python3.11/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'customer_id'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/anaconda3-2023.09-0/envs/marketing_env/lib/python3.11/site-packages/pymc_marketing/clv/models/beta_geo.py:116, in BetaGeoModel.__init__(self, data, model_config, sampler_config)
    115 try:
--> 116     self.customer_id = data["customer_id"]
    117 except KeyError:

File ~/.pyenv/versions/anaconda3-2023.09-0/lib/python3.11/site-packages/pandas/core/frame.py:3761, in DataFrame.__getitem__(self, key)
   3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
   3762 if is_integer(indexer):

File ~/.pyenv/versions/anaconda3-2023.09-0/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655, in Index.get_loc(self, key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.

KeyError: 'customer_id'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 beta_geo_model = clv.BetaGeoModel(data = data)

File ~/.pyenv/versions/anaconda3-2023.09-0/envs/marketing_env/lib/python3.11/site-packages/pymc_marketing/clv/models/beta_geo.py:118, in BetaGeoModel.__init__(self, data, model_config, sampler_config)
    116     self.customer_id = data["customer_id"]
    117 except KeyError:
--> 118     raise KeyError("customer_id column is missing from data")
    119 try:
    120     self.frequency = data["frequency"]

KeyError: 'customer_id column is missing from data'
animenon commented 11 months ago

Error in short: KeyError: 'customer_id column is missing from data'

xhulianoThe1 commented 11 months ago

Seems this dataset doesn't have the "customer_id" column which is required for the Beta Geo Model.

Setting the index as the customer_id should fix the issue given it just needs a unique identifier...

data['customer_id'] = data.index

juanitorduz commented 11 months ago

Do you want to do a pull request :) ?

xhulianoThe1 commented 11 months ago

Will submit a pr.

ricardoV94 commented 10 months ago

Closed via #440