Closed philipp-classen closed 11 months ago
Requires some work. Multiple tests are breaking with the converted trackerdb.sql file and the website can't be generated.
To try it out, the data can be found on the migrated_trackerdb
branch.
Inconsistencies: WhoTracks.me expects that all trackers have a domain, for instance:
name: Tealium Ads
category: advertising
website_url: https://www.tealium.com/
organization: tealium
--- filters
##.tealium-ad
##.tealiumAdSlot
--- filters
ghostery_id: 4055
Compare it without a working one:
name: TikTok Analytics
category: site_analytics
website_url: https://analytics.tiktok.com
organization: bytedance_inc
--- domains
analytics.tiktok.com
--- domains
--- filters
||analytics.tiktok.com^$3p
--- filters
ghostery_id: 4050
Maybe there are other inconsistencies (apart from domains being mandatory).
The May release is the first released computed with trackerdb data. In that regard, the migration is done.
Still, I leave the ticket open, since the manual step of taking the released trackerdb binary dump and converting it to trackerdb.sql should be automated, too. But, at least, the data model is now consistent.
Added the update_trackerdb.sh script to automate the update of the trackerdb.sql file.
https://github.com/ghostery/trackerdb has been open sourced in February 2023.
Currently, the data is checked in here (and requires a manual step to keep it in sync): https://github.com/whotracksme/whotracks.me/blob/master/whotracksme/data/assets/trackerdb.sql
Since the data is all public now, we should instead use it directly from the other repository.