promotably / api

API service
Eclipse Public License 1.0
0 stars 1 forks source link

Know vs Think #247

Closed bpromo closed 9 years ago

bpromo commented 9 years ago

Trying to drill into some of the things we think vs know:

It appear they are: 51.6% / 48.3% control / test

Somewhat frustratingly it seems events like cartupdate were already a bit higher for test over control (54% of cartupdate happened in the test group) before an offer was ever in the system. After the offer started showing up metrics got even better: 59% of cartupdate were from the test group.

Looks like we are!

bpromo commented 9 years ago

Ok connected to my prod db snapshot and ran some queries:

Analysis

Looks like the distribution is pretty close. Its definitely not 60/40.

The control_group == false group is a lot more active. The next thing I want to check is what the numbers look like before we started showing an offer. Ideally the numbers are pretty even between groups and then take off when our offer starts showing up.

The productview query is interesting. I'm not sure what reason I could give for the big difference in control_group views. Maybe before we stopped recording crawlers we handed one out a control_group ID to one or two? Anyways when I filter the time to recently it distributed more as we would hope.

Bucket Assigned

select COUNT('bucket-assigned') from events where type = 'bucket-assigned'; 9,775

select COUNT('bucket-assigned') from events where type = 'bucket-assigned' and control_group = true; 5,049 51.6%

select COUNT('bucket-assigned') from events where type = 'bucket-assigned' and control_group = false; 4,726 48.3%

Cart Update

select COUNT(*) from events where type = 'cartupdate' and control_group = true; 155

select COUNT(*) from events where type = 'cartupdate' and control_group = false; 214

...now a more recent query....

select COUNT(*) from events where type = 'cartupdate' and control_group = true and created_at > '2015-05-10 01:14:10.462889+00'; 56

select COUNT(*) from events where type = 'cartupdate' and control_group = false and created_at > '2015-05-10 01:14:10.462889+00'; 82

Product View

select COUNT(*) from events where type = 'productview' and control_group = true; 7,768

select COUNT(*) from events where type = 'productview' and control_group = false; 4,734

...now a more recent query....

select COUNT(*) from events where type = 'productview' and control_group = true and created_at > '2015-05-10 01:14:10.462889+00'; 1,717

select COUNT(*) from events where type = 'productview' and control_group = false and created_at > '2015-05-10 01:14:10.462889+00';

1,778

cc @cvillecsteele @smnirven @tomsouthall @wgb @vrivellino

bpromo commented 9 years ago

So if we launched our first Pacwave offer on the 13th (big if right now):

select COUNT(*) from events where type = 'cartupdate' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = true and created_at < '2015-05-13 00:01:00';

114 or 44% of total cartupdates for that time period

select COUNT(*) from events where type = 'cartupdate' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = false and created_at < '2015-05-13 00:01:00';

137 or 54% of total cartupdates for that time period

If we look at behavior after we launched the offer:

select COUNT(*) from events where type = 'cartupdate' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = true and created_at > '2015-05-13 00:01:00';

41 or of 40% total cartupdates for that time period

select COUNT(*) from events where type = 'cartupdate' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = false and created_at > '2015-05-13 00:01:00';

60 or 59% of total cartupdates for that time period

So I'm still not sure about the 13th but I do think an analysis like this is important. If our control_group cluster was already performing a lot worse at our most important metrics we can't take total credit for their continued poor behavior when offers are being made to the other group.

Now why our test group performs so much better on their own is a mystery to me. Distribution looks good when I look at site_id and created_at. Pacwave isn't experiencing any kind of anomalies. Still slightly more control_group = true than false, but that just underscored that whoever is randomly falling in our test group seems to be a lot more active.

Well, it seems that way given the dates. So its worth trying to get these dates right! @cvillecsteele @tomsouthall if you guys can provide a day and time I can redo this analysis...

bpromo commented 9 years ago

@cvillecsteele just occurred to me that maybe I have metrics wrong in my head. Are the 'promotably' metrics based on the test group or is it the test group individuals who were shown an offer?

bpromo commented 9 years ago

Can't sleep so more digging in the data. For all I know I'm verifying what metrics is showing right now ...

Question I am asking is, do offers cause people to people to buy more ... and it looks like the data is saying yes:

First a set of all IDs (offershowns) who have been shown an offer since the 13th

SELECT site_shopper_id 
INTO offershowns FROM events 
WHERE site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'offershown' AND control_group = false AND created_at > '2015-05-13 00:00:01';

Find out how many of these IDs had a thankyou event

SELECT COUNT(*) FROM events 
INNER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'thankyou' AND control_group = false AND created_at > '2015-05-13 00:00:01';

17 or 60% of the thankyous were issued to IDs that had seen an offer

SELECT COUNT(*) FROM events 
LEFT OUTER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE offershowns.site_shopper_id IS null AND site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'thankyou' AND control_group = false AND created_at > '2015-05-13 00:00:01';

11 or 39% of the thankyous were issued to IDs that hadn't seen an offer.

bpromo commented 9 years ago

Second question was what about cartupdate. Again it seems those who were made offers are much more active with their carts accounting for 65% of cartupdates

SELECT COUNT(*) FROM events 
INNER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'cartupdate' AND control_group = false AND created_at > '2015-05-13 00:00:01';

47 or 65%

SELECT COUNT(*) FROM events 
LEFT OUTER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE offershowns.site_shopper_id IS null AND site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'cartupdate' AND control_group = false AND created_at > '2015-05-13 00:00:01';

25 or 35%

bpromo commented 9 years ago

productview is where it just gets silly guys:

SELECT COUNT(*) FROM events 
INNER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'productview' AND control_group = false AND created_at > '2015-05-13 00:00:01';

1,048 or 70%

SELECT COUNT(*) FROM events 
LEFT OUTER JOIN offershowns ON events.site_shopper_id = offershowns.site_shopper_id 
WHERE offershowns.site_shopper_id IS null AND site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' AND type = 'productview' AND control_group = false AND created_at > '2015-05-13 00:00:01';

464 or 30%

bpromo commented 9 years ago

And getting back to my weird finding about cartupdate before offers started being made. Though our test group was already performing 10% above the control group wrt to cartupdate they were pretty much in a dead heat with thankyou: 42 test to 37 control. Our test group still seemed primed to perform coming into offer season, but not nearly as skewed as above.

But I can't stop there. This is a bit concerning. productview is pretty skewed:

select COUNT(*) from events where type = 'productview' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = true and created_at < '2015-05-13 00:00:01';

6,472 or 66%

select COUNT(*) from events where type = 'productview' and site_id = '995ead32-3c3b-4e4d-aeb3-5bb250fa410c' and control_group = false and created_at < '2015-05-13 00:00:01';

3,342 or 34%

Before offers were made out control group had almost twice as many product views?

This is where it gets weird. After the offers went out > '2015-05-13 00:00:01': 1295 control to 1378 test. Wtf.

@smnirven is that when we stopped the bots?

cvillecsteele commented 9 years ago

@bpromo Can you catch me up on this analysis verbally tomorrow? It sounds like we're in the clear? But I think I need a walkthrough. Could skype / phone tomorrow anytime after 9am.

bpromo commented 9 years ago

@cvillecsteele sure thing!