pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
668 stars 184 forks source link

Different default priors for HSGP TVPs #774

Open bwengals opened 3 months ago

bwengals commented 3 months ago

I think some of the TVP fits in the TVP notebook can be improved by using different priors on the GP.

Basically, I'd propose to:

  1. Use the PC prior as the default prior for a GP TVP.
  2. Allow the user to overwrite the particular settings of that PC prior.
  3. Allow the user to override the PC prior with a more informative Inverse Gamma prior where they choose mu and sigma or the upper and lower tail probabilities and mass, a la pm.find_constrained_prior.
  4. Use this function to set m and c to match the prior given to the lengthscale. This will require assuming how far into the future the user wants to make predictions. If the user wants reliable predictions further into the future, they will need to specify the number of days they need.

The main idea is to use the PC prior for the GP scale (eta) and lengthscale (ls) as the default prior. I think this could potentially be done using the data and without any user input. The PC prior is derived as joint prior. It works out to be an Exponential on eta and a Frechet or inverse Weibull on the lengthscale.

PC priors are not uninformative, so the user potentially needs to set two tail probabilities:

  1. For the scale eta: $p(\eta > U) = \alpha$. User must choose $U$ and $\alpha$.` eta is just like a standard deviation, so some fraction of the overall standard deviation of the data could be used here.
  2. For the lengthscale ls: $p(\ell < L) = \alpha$. User must choose $L$ and $\alpha$. If the data is daily, is it reasonable to not measure variation on a scale of less than one week? If so, we could automatically choose L = 7 and $\alpha=0.01$. For daily data though, there must be a hard cutoff at L = 1, the resolution of the data.

Then, if the user has more prior knowledge about the lengthscale, the InverseGamma prior becomes a good choice.

What do people think about setting sensible defaults for the scale and the lengthscale using just the data, or should the user be forced to make a choice? Any guidance on the implementation would be greatly appreciated.

juanitorduz commented 3 months ago

This is awesome! Our users are definitively not experts in GPs; having suitable default priors is key 🙏! I'll take a look into the PR this week!

Thanks for bringing your expertise to the package 🙌

bwengals commented 3 months ago

Sure thanks @juanitorduz, the PR is quite bad, so forgive me on that. In my mind the first step is, does this proposal sound good? and then step 2 is how to implement it. The PR, mostly the notebook, is more to demonstrate that this works alright.

Also tagging @ulfaslak since this got kickstarted by a discussion we had.

juanitorduz commented 3 months ago

@bwengals I really like the proposal, and from my side, it is a key addition to the model! I especially like how the out-of-sample -prediction looks in the notebook. 💪

I suggest we work on step (2). One thing we need to keep in mind is what the default parameters should be when the optimization step in the find-constrained-prior fails. We can have a simple heuristic.

Please let me know how we can support you (besides the reviews) to work on the PR.

@wd60622 @ulfaslak any other thoughts from your side?