pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
683 stars 190 forks source link

CLV quickstart example uses wrong frequency #256

Closed AdJasper closed 1 year ago

AdJasper commented 1 year ago

First of all thanks for this amazing library and it's great to see many people actively working on this!

I have two questions regarding the example notebook CLV Quickstart.

  1. While estimating the CLV, one of the arguments of the expected_customer_lifetime_value method is time. The comment in this line says you are predicting for 12 months while the units of time in the rest of the notebook are in weeks. In addition, if I check the API documentation of this method, the default value for freq is "D" (days). So if you are predicting for months, freq should equal "M", right?
  2. How did you define "Customer Lifetime Value" in your package because if we speak about the lifetime of a customer, that also includes the time before the prediction. I believe the method is calculating the value a customer creates after time T, where the period up to T is used to fit the model.

An example: Customer X makes its first purchase at 20-04-2022 (DD-MM-YYYY). We fit a model today (20-04-2023) and predict the number of purchases and their average monetary value for one year in the future (up untill 20-04-2024). The CLV is then the value that the customer generated from 20-04-2022 untill 20-04-2024, right? Is this also what we are computing when we use this package? Or do we estimate the value that the customer will generate in the period today untill 20-04-2024 (so for the future only)?

ricardoV94 commented 1 year ago

Thanks for the questions.

In addition, if I check the API documentation of this method, the default value for freq is "D" (days). So if you are predicting for months, freq should equal "M", right?

I could be mistaken, but I think freq is just about the granularity (number of steps) with which you estimate the discounted lifetime value. The value is always in terms of months, but you can estimate it using hourly, daily, weekly or monthly steps. The smaller the steps, the more computations are needed, but the more accurate the approximation would be. I will double check to be sure. I was mistaken, see below

I believe the method is calculating the value a customer creates after time T, where the period up to T is used to fit the model.

Yes that's what it is doing. You already observed the value of the customer until today, so there's no uncertainty there.

AdJasper commented 1 year ago

Thanks for clarifying so quickly Ricardo! And thanks for double checking :)

ricardoV94 commented 1 year ago

There's a problem in the example.

It is true that t=12 means we want to estimate for 12 months, and the discount rate is also per month. But we should have passed freq="w" because our dataset is in weeks, not in months, as you pointed.

I think this was an error in the original lifetimes example which we just borrowed: https://lifetimes.readthedocs.io/en/latest/Quickstart.html#the-gamma-gamma-model-and-the-independence-assumption