Closed jpzhangvincent closed 1 year ago
See the example starting in cell 12 of the frequentist notebook. The df has a nr_of_items
column passed to numerator_column
and a nr_of_items_sumsq
passed to the numerator_sum_squares_column
of the ZTest. These two columns are used together with the denominator_column
to compute the variance of the continuous metric.
If you prefer you can use the StudentsTTest
class instead of the ZTest
class.
See the example starting in cell 12 of the frequentist notebook. The df has a
nr_of_items
column passed tonumerator_column
and anr_of_items_sumsq
passed to thenumerator_sum_squares_column
of the ZTest. These two columns are used together with thedenominator_column
to compute the variance of the continuous metric.If you prefer you can use the
StudentsTTest
class instead of theZTest
class.
I'm still a bit confused about the set up of the data frame. Just to confirm, it doesn't seem like each row represent a sample. Does the nr_of_items
mean the average of a continuous variable of interest in a variant group, nr_of_items_sumsq
represents the sum(x_i - x_mean)^2
and user
means the number of sample size in a variant group? And the API expects the user to pre-calculate those statistics and construct the data frame like that. I'm wondering whether it's better to have a simpler API interface like scipy.stats.ttest_*
to simply pass into two list of observations.
At Spotify the sample size is often in the hundreds of millions, and then it's not very convenient to pass in every single observation, so we prefer using summary statistics.
To make it more concrete, let's imagine that nr_of_items
is the number of playlists a Spotify user has created. Let's say we have five users in the control group who created 3,2,4,0,1 playlists respectively. Then nr_of_items
would be the sum, 3+2+4+0+1=10 and nr_of_items_sumsq
would be 3^2+2^2+4^2+0^2+1^2=30 and users
would be 5. Similarly for the treatment group. Internally we can use these summary statistics to compute mean as nr_of_items
/users
and the variance as nr_of_items_sumsq
/users
-nr_of_items
/users
^2 and then we can use that to compute test-statistics and confidence intervals.
Does that make sense?
At Spotify the sample size is often in the hundreds of millions, and then it's not very convenient to pass in every single observation, so we prefer using summary statistics.
To make it more concrete, let's imagine that
nr_of_items
is the number of playlists a Spotify user has created. Let's say we have five users in the control group who created 3,2,4,0,1 playlists respectively. Thennr_of_items
would be the sum, 3+2+4+0+1=10 andnr_of_items_sumsq
would be 3^2+2^2+4^2+0^2+1^2=30 andusers
would be 5. Similarly for the treatment group. Internally we can use these summary statistics to compute mean asnr_of_items
/users
and the variance asnr_of_items_sumsq
/users
-nr_of_items
/users
^2 and then we can use that to compute test-statistics and confidence intervals.Does that make sense?
Ah .. that makes sense for the ease of computation and scalability. It takes me a while to wrap my head around but glad I understand better on the motivation now. It would be great to have some documentation on the notebook example and API. Thanks!
It seems the example(i.e Z-test) in the notebook(frequentist) is only for analyzing the binary metrics(like conversion rate). Does this package also support T-test for continuous variable? I saw the
StudentsTTest
requires to input bothnumerator_column
anddenominator_column
columns (from a contingency table format?) so I'm not sure whether it's possible to perform the two-sample T-test just on one continuous variable column with the API. Any example and documentation would be appreciated!