whotracksme / whotracks.me

Data from the largest and longest measurement of online tracking.
https://www.ghostery.com/whotracksme
MIT License
407 stars 73 forks source link

Investigate: should analytics.tiktok.com be detected as a tracker? #262

Closed philipp-classen closed 2 years ago

philipp-classen commented 2 years ago

Currently, it is not detected as a tracker by WhoTracks.me, but perhaps it should be. In the raw data (tp_events), third-party requests do exist. The question is now whether it should have be detected as a third-party tracker. And if so, why are the algorithms missing it?

(Context: originally reported here https://github.com/whotracksme/whotracks.me/issues/261)

philipp-classen commented 2 years ago

From the raw data, the most popular pages in my sample data were https://www.nfl.com/ and https://www.lowes.com/. Both sent 3rd party requests to https://analytics.tiktok.com/i18n/pixel/identify.js with the same cookie:

cookie: ttwid=1%7CbA3a_4BMyb4ekrjPp4aDV20oq9YBjHAHqUZGZ1b-0aA%7C1641585649%7C11cda80986961557d1772f348d7ae2fce7887e551b671e1faf2343b0058cbd40

On a different profile, I got another unique identifier. Not clear why we don't detect it, I'll mark it as a bug. This is clearly cross-site tracking, and the amount of traffic is not too small that can miss it because of that. (The Ghostery extension reports it also as tracker, by the way.)

philipp-classen commented 2 years ago

We are confident that we tracked it down in our internal processing pipeline (a missing mapping step). Existing trackers are not affected, but new ones will not be detected. In the raw data, everything is there, so once we fixed it, they should show up when the data is recomputed.

From what I see in sample data, analytics.tiktok.com is one of the largest that we miss. static.cloudflareinsights.com is another candidate that we should investigate (also had the most traffic).

philipp-classen commented 2 years ago

The data from January has been now processed. analytics.tiktok.com is now detected as a tracker:

https://whotracks.me/trackers/tiktok_analytics.html

It could be that the stats will change with the next month, as we made the internal changes in the middle of January. That could affect the estimated popularity (increasing its relative ranking). Otherwise, the data looks OK as far as I can tell (e.g. the sites from the samples nfl.com and lowes.com are among the most prominent pages).

philipp-classen commented 2 years ago

Leaving it open till March to compare whether the data changes with a full month.

For reference, this is how it looks now: 2022-02-02-tiktok-analytics

philipp-classen commented 2 years ago

Closing it now. With the March release, the reach increased (0.7% to 1.2%), but the data looks stable.