pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
683 stars 190 forks source link

Add Pareto/NBD Model #127

Closed ColtAllen closed 1 year ago

ColtAllen commented 1 year ago

Now that the Gaussian Hypergeometric Function is available in PyTensor (thanks @ricardoV94!) the way is paved for adding the Pareto/NBD model. This model was the first CLV model introduced and is still one of the most performant.

The Lifetimes implementation contains a numerical error in the likelihood function (which has given me grief when using this model in practice). The fix is described in the very last line of this paper.

http://brucehardie.com/notes/027/bgnbd_num_error.pdf

This model also has an additional predictive method for predicting the probability of a customer being alive X days into the future. To my knowledge, no other CLV models provide this, and can only predict the probability of being alive in the current moment. This method is not available in Lifetimes, but I did add to my BTYD fork.

zwelitunyiswa commented 1 year ago

I am excited about the function to predict the probability of a customer being alive X days into the future. CLVtool in R has the PNB but not this function, which I always found a little odd.

ColtAllen commented 1 year ago

@ricardoV94 brought this up in https://github.com/pymc-labs/pymc-marketing/pull/131

The bad news is that NUTS seems completely useless to sample the cdnow dataset (even when taking draws from the RNG directly and with more data). It could be that this set of parameters is specifically bad. I had to disable jittering in the init method, otherwise the sampler would start at some point where the gradients would fail to converge immediately. Even without jittering, it was taking way too long to sample, suggesting very high tree depth or still too many iterations in the gradient methods :/

I had to sample with Slice. The find_MAP is also quite slow (maybe 30s-1min to fit)

Perhaps more informative priors would have helped, but I didn't bother to try.

If the bottleneck is really the speed of the hyp2F1 derivatives (which I suspect it is) we can try to: 1) compute the multiple derivatives together, and 2) jit it with Numba

CDNOW is a small dataset, so more informative priors would have helped for sure. However, are there any other suggestions or things for me to keep in mind as I proceed with adding this model? The hyp2F1 derivatives seem like more of a pytensor issue.

ricardoV94 commented 1 year ago

I did try with a sample of 10k using the CDNOW fit as the true parameters and NUTS still couldn't sample. Is that still very small data? I don't think more data will help here as the problem was not due to flat posteriors.

Perhaps the problem is with the parameter ranges tested?

BTW the notebook is here if someone wants to play with other datasets: https://github.com/pymc-labs/pymc-marketing/blob/main/docs/source/notebooks/clv/dev/pareto-nbd.ipynb

ColtAllen commented 1 year ago

I did try with a sample of 10k using the CDNOW fit as the true parameters and NUTS still couldn't sample. Is that still very small data? I don't think more data will help here as the problem was not due to flat posteriors.

I've found find_MAP and MLE don't give equivalent results unless the dataset is >30k or so.

I do see the synthetic data is using the same scalar T value for all samples - logp will return -np.inf if t_x > T, but would that impact NUTS? Either way, it probably won't recover the true params with that T value.

ColtAllen commented 1 year ago

Perhaps the problem is with the parameter ranges tested?

All parameters have a >0 requirement, so this could be it.

ricardoV94 commented 1 year ago

I've found find_MAP and MLE don't give equivalent results unless the dataset is >30k or so.

I am not worried about the differences as much as the slowness (even of find_MAP) compared to lifetimes.

I do see the synthetic data is using the same scalar T value for all samples - logp will return -np.inf if t_x > T

That's fine because they were all generated with that scalar T, so the draws can't invalidate the constraint

All parameters have a >0 requirement, so this could be it.

That's okay. The priors used were all >0 as well. That's also true of the the other models and they aren't as slow.

ColtAllen commented 1 year ago

All parameters have a >0 requirement, so this could be it.

That's okay. The priors used were all >0 as well. That's also true of the the other models and they aren't as slow.

param draws extremely close to 0 seem typical of these HalfNormal priors, which could be causing the gradients to blow up. I know the starting param values for lifetimes MLE are all equal to 1.

I can try out different default priors when I work on the PR for this, but in the meantime would it be worthwhile to start on the gradient speed enhancements in pytensor? scpiy.special.hyp2f1 is written in Cython but I don't know how transferrable this is for a pytensor C implementation.

ricardoV94 commented 1 year ago

We don't need to do anything to the Hyp2F1 itself, which just calls the scipy function. The problem are the gradients which use a vanilla numpy loop. We could try to compile those to Numba to see if it brings speed down enough for NUTS to be usable.

Worth also investigating the overflow that's happening to see of we can abort the loop earlier than the max_iters.

I can investigate this issue while you go ahead with the model PR


It would be great if you could investigate if things improve with better priors.

The problem with giving more informative priors by default is that they may not be generally applicable. Still, if we see they make a big difference it's definitely worth to highlight it in the docstrings of the model and also in the examples we include.

ColtAllen commented 1 year ago

It would be great if you could investigate if things improve with better priors.

I've found a major speed improvement with Weibull priors, but NUTS is still basically unusable.

I want to read up more on the Abe variant of the ParetoNBD model before proceeding with this:

https://core.ac.uk/download/pdf/18510752.pdf https://link.springer.com/article/10.1007/s11573-021-01057-6 http://www.cirje.e.u-tokyo.ac.jp/research/dp/2008/2008cf537.pdf http://www.cirje.e.u-tokyo.ac.jp/research/dp/2009/2009cf616.pdf

ColtAllen commented 1 year ago

Closing now that https://github.com/pymc-labs/pymc-marketing/pull/312 is merged.