Closed rtol5 closed 10 months ago
Yes, we need to add a "thin" option to these methods
Thanks @ricardoV94. I'd love to help with a pull request if I knew how to, but unfortunately I don't.
Is there anything else I can do to help with this? This feature would be really helpful for us.
Alternatively, if there's another way to generate CLVs with the pymc_marketing.clv
, I'd love to try that. For the "main" method outlined in one of the tutorial notebooks, I'm running into this issue.
Actually, reading through the tutorial notebooks again, I see that this section https://www.pymc-marketing.io/en/stable/notebooks/clv/clv_quickstart.html#ranking-customers-from-best-to-worst generates a "thin" output with num_purchases.mean(("chain", "draw")).values
.
Just confirming – I should be able to the same with clv.utils.customer_lifetime_value
to get my intended output, right?
Instead of adding a thin
to every method, I decided to add it to the model itself. Then a user gets back a model with the thinned dataset, and can call whatever methods they want with it (and doesn't need to destroy the full dataset of the original model)
Hey ricardo, please could you demonstrate in a few lines how this functionality would be used?
@tomthepeach it will look something like:
fitted_gg_thinned = fitted_gg.thin_fit_result(keep_every=10)
fitted_bg_thinned = fitted_bg.thin_fit_result(keep_every=10)
ggf_clv_thinned = fitted_gg_thinned.expected_customer_lifetime_value(
transaction_model=fitted_bg_thinned,
customer_id=t.index,
frequency=t["frequency"],
recency=t["recency"],
T=t["T"],
mean_transaction_value=t["monetary_value"],
)
You could sample less draws from the get go when calling model.fit
, but usually you want enough to at least check convergence.
Looks good! Would be awesome to get this merged into main, perhaps this should be the default/ recommended approach? I'm not sure what the utility is for the current implementation
Looks good! Would be awesome to get this merged into main, perhaps this should be the default/ recommended approach? I'm not sure what the utility is for the current implementation
Thinning loses information so we shouldn't do by default. It's up to the user to decide if they are no longer going to need all the draws, and that depends very much on their workflow. Hopefully this makes it easier to make that decision.
Thanks for flagging that this feature is relevant for you. We'll try to get it merged soon, I think there were still some tests failing. Follow the PR to be up to date!
As the title says,
clv.utils.customer_lifetime_value
returns an unmanageably large dataset for me. For a sample dataset with 25k users, our final dataset results in 100 million rows.Below are the key steps I ran, where
clv_df_all
is my full dataset andclv_df_freq1
filters that dataset down to users with frequency>0:Is this expected behavior (i.e. am I missing a post-processing step)? Or is this unintended behavior?