Open yalisassoon opened 8 years ago
I ran a first test:
WITH duid_blacklist AS ( -- blacklist domain_userid with duplicates
SELECT DISTINCT domain_userid FROM atomic.events
WHERE event_id IN (SELECT event_id FROM (SELECT event_id, COUNT(*) FROM atomic.events WHERE v_tracker LIKE 'js-2.6.0%' GROUP BY 1) WHERE count > 1)
)
SELECT comparison, COUNT(*) FROM (
SELECT
domain_userid,
COUNT(DISTINCT domain_sessionidx) AS idx,
COUNT(DISTINCT domain_sessionid) AS id,
COUNT(DISTINCT domain_sessionidx) = COUNT(DISTINCT domain_sessionid) AS comparison
FROM atomic.events
WHERE v_tracker LIKE 'js-2.6.0%' -- restrict to a single tracker
AND domain_userid NOT IN (SELECT domain_userid FROM duid_blacklist)
GROUP BY 1
) GROUP BY 1;
First results show a difference between domain_sessionidx
and domain_sessionid
in about 0.1-0.3% of sessions.
I updated the SQL:
WITH duid_blacklist AS ( -- blacklist domain_userid with duplicates
SELECT DISTINCT domain_userid FROM atomic.events
WHERE event_id IN (SELECT event_id FROM (SELECT event_id, COUNT(*) FROM atomic.events WHERE v_tracker LIKE 'js-2.6.0%' GROUP BY 1) WHERE count > 1)
)
SELECT c1, c2, COUNT(*) FROM (
SELECT
domain_userid,
COUNT(DISTINCT domain_sessionidx) AS idx,
COUNT(DISTINCT domain_sessionid) AS id,
COUNT(DISTINCT domain_sessionidx) > COUNT(DISTINCT domain_sessionid) AS c1,
COUNT(DISTINCT domain_sessionidx) < COUNT(DISTINCT domain_sessionid) AS c2
FROM atomic.events
WHERE v_tracker LIKE 'js-2.6.0%' -- restrict to a single tracker
AND domain_userid NOT IN (SELECT domain_userid FROM duid_blacklist)
GROUP BY 1
) GROUP BY 1,2;
The most common case appears to be domain_sessionid
changes even though domain_sessionidx
doesn't increase. The reverse also happens but not as often (2-3 orders of magnitude less frequent).
A last note: it's sometimes more frequent than the 0.1-0.3% mentioned before (one dataset had 1-3% of sessions affected).
One possible way this can happen even when cookies appear to be enabled:
A user navigates to the page They open another tab in the same domain They disable cookies (this isn't reported by the tracker due to #468) They leave for a while They come back to the first tab, starting a new session and incrementing that tab's session count They go to the second tab, starting a new session in that tab too. The tabs will have the same session count but different session IDs (since they cannot communicate by cookies).
This does require a user to turn off 1st-party cookies whilst browsing the site, which does seem too unlikely to account for 1-3% of sessions.
Pushing back...
Some interesting additional info from one of our users here: https://snowplow.zendesk.com/agent/tickets/2436
They found cases where the domain_sessionidx
would suddenly go down e.g. from 30 to 10. When this happens the domain_sessionid
changes (so it registers a new sesssion) but it you don't get a new unique domain_userid || domain_sessionidx
combination.
@bogaert was going to see if we see a similar pattern in other Snowplow data sets.
We have seen this happen before.
Pushing back.
We've seen this happen on a number of different data (each time occasionally). It would be good to understand why.