Open ricardoV94 opened 1 year ago
Do you mean the results from the paper http://brucehardie.com/notes/017/sBG_estimation.pdf ?
I was not thinking about that specifically, just the original paper (one cohort over time).
In the multiple cohort paper you mentioned they are just doing complete pooling across multiple cohorts right? Would be nice if we didn't need to do anything extra to allow that. Something like:
alpha = pm.HalfNormal.dist()
beta = pm.HalfNormal.dist()
cohort1 = clv.ShiftedBetaGeoCohortModel(..., alpha_prior=alpha, beta_prior=beta)
cohort2 = clv.ShiftedBetaGeoCohortModel(..., alpha_prior=alpha, beta_prior=beta)
cohorts = clv.concatenate_models(cohort1, cohort2)
And it would create a PyMC model that includes cohort1 and cohort2 as submodels, sharing the alpha and beta.
Calling cohorts.fit()
would then provide the relevant part of the InferenceData to cohort1
and cohort2
so that you could use their special summary/plotting methods.
This would be neat because we could ultimately use more structured hyperpriors across models, even of different nature (e.g., correlation between lifetime and value).
Anyway... I was just talking about the single cohort model here
Interesting! Makes sense! In addition, we could provide a hierarchical model?
What do you mean by hierarchical model? Pooling across hyperparameters? If so: yes, that was the idea as well
133 implements the same model, at the user level granularity.
It may make sense to implement a cohort level model. The posterior individual variables for costumers that enrolled at the same time are identical anyway, so it doesn't make sense to sample each one separately if the data comes in large cohorts.
Some of the summary statistics in #167 may also only make sense to the cohort level model.