google / lifetime_value

Apache License 2.0
313 stars 83 forks source link

Why the ltv prediction part use probabilitity prediction multiply expectation as the final ltv prediction? #11

Open RuiSUN1124 opened 1 year ago

RuiSUN1124 commented 1 year ago

Ref: https://github.com/google/lifetime_value/blob/dd418967b3ca456b375b0ae8adac38732f7831db/lifetime_value/zero_inflated_lognormal.py#L35

Strategy24 commented 7 months ago

I also found it strange.

As far as I understood, regression part of the model is trained on the subset of customers who have observed nonzero LTV:

positive = tf.cast(labels > 0, tf.float32)

safe_labels = positive * labels + (
      1 - positive) * tf.keras.backend.ones_like(labels)

regression_loss = -tf.keras.backend.mean(
      positive * tfd.LogNormal(loc=loc, scale=scale).log_prob(safe_labels),
      axis=-1)`

If loc and scale give the best accuracy prediction on this subset of customers, then

preds = (positive_probs *
      tf.keras.backend.exp(loc + 0.5 * tf.keras.backend.square(scale)))

gives shifted estimation in general case, since positive_probs are not 0 or 1, but somewhere between them.

I think probability estimated by the classification part of the model should somehow be taken in consideration by the regression part of the model.

Ty4Code commented 5 months ago

It actually makes perfect sense if you think about what the intention of a zero-inflated log normal method is.

Imagine a simple case where a customer has an LTV of either 0$ with 99% probability, or has an LTV of exactly 100$ otherwise.

When we use a zero-inflated method for LTV, we are estimating the probability mass of zero LTV customers (classification) and we are estimating the conditional expected LTV for the non-zero LTV customers.

So in the case above, our perfect model would estimate the customer has a 1% chance of having a non-zero LTV and if they are non-zero LTV then we estimate their LTV EV to be 100$.

But if we just take the regression output then we would say the expected LTV of our customers is 100$ but this is clearly not true. We have to multiply the probability of the customer being non-zero by their expected LTV conditioned on them being non-zero.

If we assume that y is non-negative, then we can see that:

E(y) = P(y > 0) E(y | y > 0) + P(y = 0) E(y | y = 0) E(y) = P(y > 0) E(y | y > 0) + P(y = 0) (0) E(y) = P(y > 0) * E(y | y > 0)

Our model is essentially estimating P(y > 0) with the classification output and it is estimating E(y | y > 0) with the regression output.

So that is why we multiply the probability of non-zero LTV with the conditional customer expected LTV to get the true customer expected LTV that we care about which is E(y)