Closed ColtAllen closed 1 year ago
I am excited about the function to predict the probability of a customer being alive X days into the future. CLVtool in R has the PNB but not this function, which I always found a little odd.
@ricardoV94 brought this up in https://github.com/pymc-labs/pymc-marketing/pull/131
The bad news is that NUTS seems completely useless to sample the cdnow dataset (even when taking draws from the RNG directly and with more data). It could be that this set of parameters is specifically bad. I had to disable jittering in the init method, otherwise the sampler would start at some point where the gradients would fail to converge immediately. Even without jittering, it was taking way too long to sample, suggesting very high tree depth or still too many iterations in the gradient methods :/
I had to sample with
Slice
. Thefind_MAP
is also quite slow (maybe 30s-1min to fit)Perhaps more informative priors would have helped, but I didn't bother to try.
If the bottleneck is really the speed of the hyp2F1 derivatives (which I suspect it is) we can try to: 1) compute the multiple derivatives together, and 2) jit it with Numba
CDNOW is a small dataset, so more informative priors would have helped for sure. However, are there any other suggestions or things for me to keep in mind as I proceed with adding this model? The hyp2F1
derivatives seem like more of a pytensor
issue.
I did try with a sample of 10k using the CDNOW fit as the true parameters and NUTS still couldn't sample. Is that still very small data? I don't think more data will help here as the problem was not due to flat posteriors.
Perhaps the problem is with the parameter ranges tested?
BTW the notebook is here if someone wants to play with other datasets: https://github.com/pymc-labs/pymc-marketing/blob/main/docs/source/notebooks/clv/dev/pareto-nbd.ipynb
I did try with a sample of 10k using the CDNOW fit as the true parameters and NUTS still couldn't sample. Is that still very small data? I don't think more data will help here as the problem was not due to flat posteriors.
I've found find_MAP
and MLE don't give equivalent results unless the dataset is >30k or so.
I do see the synthetic data is using the same scalar T value for all samples - logp
will return -np.inf
if t_x > T, but would that impact NUTS? Either way, it probably won't recover the true params with that T value.
Perhaps the problem is with the parameter ranges tested?
All parameters have a >0 requirement, so this could be it.
I've found
find_MAP
and MLE don't give equivalent results unless the dataset is >30k or so.
I am not worried about the differences as much as the slowness (even of find_MAP) compared to lifetimes.
I do see the synthetic data is using the same scalar T value for all samples -
logp
will return-np.inf
if t_x > T
That's fine because they were all generated with that scalar T, so the draws can't invalidate the constraint
All parameters have a >0 requirement, so this could be it.
That's okay. The priors used were all >0 as well. That's also true of the the other models and they aren't as slow.
All parameters have a >0 requirement, so this could be it.
That's okay. The priors used were all >0 as well. That's also true of the the other models and they aren't as slow.
param draws extremely close to 0 seem typical of these HalfNormal
priors, which could be causing the gradients to blow up. I know the starting param values for lifetimes
MLE are all equal to 1.
I can try out different default priors when I work on the PR for this, but in the meantime would it be worthwhile to start on the gradient speed enhancements in pytensor
? scpiy.special.hyp2f1
is written in Cython
but I don't know how transferrable this is for a pytensor
C implementation.
We don't need to do anything to the Hyp2F1 itself, which just calls the scipy function. The problem are the gradients which use a vanilla numpy loop. We could try to compile those to Numba to see if it brings speed down enough for NUTS to be usable.
Worth also investigating the overflow that's happening to see of we can abort the loop earlier than the max_iters.
I can investigate this issue while you go ahead with the model PR
It would be great if you could investigate if things improve with better priors.
The problem with giving more informative priors by default is that they may not be generally applicable. Still, if we see they make a big difference it's definitely worth to highlight it in the docstrings of the model and also in the examples we include.
It would be great if you could investigate if things improve with better priors.
I've found a major speed improvement with Weibull priors, but NUTS is still basically unusable.
I want to read up more on the Abe variant of the ParetoNBD model before proceeding with this:
https://core.ac.uk/download/pdf/18510752.pdf https://link.springer.com/article/10.1007/s11573-021-01057-6 http://www.cirje.e.u-tokyo.ac.jp/research/dp/2008/2008cf537.pdf http://www.cirje.e.u-tokyo.ac.jp/research/dp/2009/2009cf616.pdf
Closing now that https://github.com/pymc-labs/pymc-marketing/pull/312 is merged.
Now that the Gaussian Hypergeometric Function is available in PyTensor (thanks @ricardoV94!) the way is paved for adding the Pareto/NBD model. This model was the first
CLV
model introduced and is still one of the most performant.The Lifetimes implementation contains a numerical error in the likelihood function (which has given me grief when using this model in practice). The fix is described in the very last line of this paper.
http://brucehardie.com/notes/027/bgnbd_num_error.pdf
This model also has an additional predictive method for predicting the probability of a customer being alive X days into the future. To my knowledge, no other CLV models provide this, and can only predict the probability of being alive in the current moment. This method is not available in Lifetimes, but I did add to my BTYD fork.