volkale / volkale.github.io

0 stars 1 forks source link

Conversion Rate A B Testing #25

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Conversion Rate A B Testing

https://blog.alexandervolkmann.com/2022/01/05/conversion-rate-A-B-testing.html

jthaman commented 2 years ago

I make an intuitive argument that p is not estimable in this model. What do you think?

1-p is like the proportion of no-conversions in the data, or the proportion of lines with arrow heads in your figure. But almost any value of p can be consistent with the data you collect up to 7 days, because, for all we know, every censored observation could be a no-conversation. Also consistent with the data, every censored observation could be a conversation. The data set just doesn't have enough info to tell you about p. In terms of your figure, you only see data in the blue box, so you learn nothing about how many arrow-head lines you got.

I guess I need to get some data and fit the model because it just seems impossible to me, unless the prior is jerking p around too much.

volkale commented 2 years ago

Thanks for your comment @jthaman. Note that everything is based on the model assumptions, i.e. we have a conversion probability p and the time lag (conditional on a conversion) is distributed according to a zero-inflated geometric distribution. Now you are right that if we don't have a large enough time frame of observations relative to the "typical" lag (or in the extreme case we didn't observe any conversions) the data is compatible with many different parameters of this model class, and your data won't be very informative for your posterior inference. In the simulated data that I used however, there were enough positive (i.e. conversion) examples to infer the value or the parameter p to the precision depicted in the posterior distribution plot. Hope that answers the question.

volkale commented 2 years ago

I added a script that lets you reproduce the results and plots of this blog post here. Unfortunately, I did the rookie mistake of not setting a random seed when I produced the simulations for the article >_< , but you should get very similar numbers and plots.

jthaman commented 2 years ago

Thanks for the script, and the blog post. Ill try to work through some examples to better demonstrate the estimability issue. I don’t speak Python, so it will take some time…

jthaman commented 2 years ago

I was able to reproduce some interested results on my own in R, so I retract my earlier statements about the model not be estimable.

You might be interested in knowing that this work falls into the field of cure models https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-031017-100101

volkale commented 2 years ago

Great :) Thanks for the pointer and the paper, very interesting!

ploshay commented 2 years ago

I think it is possible to define a conversion (a week, for example) before running the test. Don't take to account everything that happens after the week. This approach will work if 90-95% of conversions happen in 7 days period.