whotracksme / whotracks.me

Data from the largest and longest measurement of online tracking.
https://www.ghostery.com/whotracksme
MIT License
405 stars 73 forks source link

Tests are broken (regression with the December trackerdb version) #345

Closed philipp-classen closed 2 months ago

philipp-classen commented 5 months ago

I noticed that currently tests are failing (would be interesting to check again with the upcoming January data):

============================= test session starts ==============================
platform linux -- Python 3.11.7, pytest-7.4.3, pluggy-0.13.1
rootdir: /home/runner/work/whotracks.me/whotracks.me
plugins: anyio-4.2.0
collected 14 items

tests/test_data_integrity.py ..                                          [ 14%]
tests/test_db_integrity.py .F                                            [ 28%]
tests/test_db_validity.py .....F.                                        [ 78%]
tests/test_site_categories.py ..                                         [ 92%]
tests/test_sites_data.py .                                               [100%]

=================================== FAILURES ===================================
_________________ TestDbIntegrity.test_companies_have_trackers _________________

self = <tests.test_db_integrity.TestDbIntegrity testMethod=test_companies_have_trackers>

    def test_companies_have_trackers(self):
        childless_companies = self.conn.execute('''select id FROM
            (select companies.id, trackers.id AS tid from companies left join trackers on trackers.company_id = companies.id)
        where tid is null''').fetchall()
>       self.assertEqual(childless_companies, [])
E       AssertionError: Lists differ: [('appnexus',), ('gabia',), ('iponweb',), [38 chars]m',)] != []
E       
E       First list contains 6 additional elements.
E       First extra element 0:
E       ('appnexus',)
E       
E       + []
E       - [('appnexus',),
E       -  ('gabia',),
E       -  ('iponweb',),
E       -  ('leaf_group',),
E       -  ('loggly',),
E       -  ('pingdom',)]

tests/test_db_integrity.py:21: AssertionError
___________ ValidateTrackerDatabase.test_no_trackers_without_domain ____________

self = <tests.test_db_validity.ValidateTrackerDatabase testMethod=test_no_trackers_without_domain>

    def test_no_trackers_without_domain(self):
        cur = self.conn.cursor()
        cur.execute('SELECT COUNT(DISTINCT tracker) FROM tracker_domains')
        domain_tracker_count = cur.fetchone()[0]
        cur.execute('SELECT COUNT(DISTINCT id) FROM trackers WHERE alias is NULL')
        tracker_count = cur.fetchone()[0]
>       self.assertEqual(domain_tracker_count, tracker_count)
E       AssertionError: 3144 != 3143

tests/test_db_validity.py:39: AssertionError
=========================== short test summary info ============================
FAILED tests/test_db_integrity.py::TestDbIntegrity::test_companies_have_trackers - AssertionError: Lists differ: [('appnexus',), ('gabia',), ('iponweb',), [38 chars]m',)] != []

First list contains 6 additional elements.
First extra element 0:
('appnexus',)

+ []
- [('appnexus',),
-  ('gabia',),
-  ('iponweb',),
-  ('leaf_group',),
-  ('loggly',),
-  ('pingdom',)]
FAILED tests/test_db_validity.py::ValidateTrackerDatabase::test_no_trackers_without_domain - AssertionError: 3144 != 3143
========================= 2 failed, 12 passed in 5.72s =========================
philipp-classen commented 5 months ago

These are companies that have been acquired by other companies (e.g. appnexus belongs to microsoft now: https://github.com/ghostery/trackerdb/issues/174)

It is related to https://github.com/ghostery/trackerdb/issues/120