whotracksme / whotracks.me

Data from the largest and longest measurement of online tracking.
https://www.ghostery.com/whotracksme
MIT License
407 stars 73 forks source link

whotracks.me/whotracksme/data/assets/trackerdb.sql -> sqlite3 error #277

Open jpgpi250 opened 2 years ago

jpgpi250 commented 2 years ago

Using the latest version of sqlite3, the database cannot be build from the sql script.

this has been discussed here, non working and working example included. It appears older sqlite3 versions ignore the error, the latest version apparently has a problem with it.

For clarification, using the debian bullseye version 3.34.1-3 ignores the problem, pihole-FTL has the latest version (3.38.2) of sqlite3 embedded, this version does show the problem.

Whould it be possible to update the sql script to allow newer versions of sqlite3 process the script without errors.

Thanks for your time and effort.

jpgpi250 commented 2 years ago

did some tests. apparently, the version of sqlite3, used to create trackersdb.sql (.dump ???) creates a script, that isn't compatible with the latest version of sqlite3. This would mean you need to install a more recent (latest) version of sqlite3 on the system, used to create the script.

philipp-classen commented 2 years ago

Thanks for reporting. I have some difficulties to reproduce though. I'm able to create a database from the sql when testing with sqlite 3.38.5 (on Arch).

This is what I tried:

$ sqlite3 --version
3.38.5 2022-05-06 15:25:27 78d9c993d404cdfaa7fdd2973fa1052e3da9f66215cff9c5540ebe55c407alt1

$ wget https://raw.githubusercontent.com/whotracksme/whotracks.me/master/whotracksme/data/assets/trackerdb.sql

$ sqlite3 ./whotracks.db < ./trackerdb.sql

$ ls -alhgtr ./whotracks.db 
... 2.2M May 16 12:34 ./whotracks.db

So, on the latest sqlite3 version, it seems to work.

jpgpi250 commented 2 years ago

Thanks for looking at this, difficult, since there are 3 parties involved (you, me and the pihole-FTL developpers). I'm trying to keep everybody in the loop, and reported your findings and my additional test here. Will get back to you as soon as possible.

Bucking-Horn commented 2 years ago

I have some difficulties to reproduce though.

You should be able to reproduce by including PRAGMA foreign_keys=ON;

Your current code seems to assume foreign key constraints are not enforced by the database engine. That may not be a valid assumption (of course, that may be debatable if that script would be intended for internal use only, and targeting a specific database, version and binary where the actual value would be known to be ON. ;) )

Different SQLite3 versions may use either ON or OFFas a database default, see https://sqlite.org/pragma.html#pragma_foreign_keys. This may even be true for the same version, depending on the compile options used to produce the binary, see https://sqlite.org/foreignkeys.html#fk_enable.

You could consider to run the statements in an order that would comply with foreign key constraints.

Alternatively, if you are sure that execution would never result in any foreign key violations, and if your SQL would target SQLite3 only, you could also include the respective PRAGMA statements in your script.

philipp-classen commented 2 years ago

I have some difficulties to reproduce though.

You should be able to reproduce by including PRAGMA foreign_keys=ON;

Yes, when adding that, it fails with such errors:

...
Runtime error near line 4925: UNIQUE constraint failed: tracker_domains.tracker, tracker_domains.domain (19)
...
Parse error near line 6642: no such table: main.trackers
Runtime error near line 7634: FOREIGN KEY constraint failed (19)
...
Bucking-Horn commented 2 years ago

For changing the order of SQL statements, you could take a peek at yubiuser's suggestion from https://discourse.pi-hole.net/t/interesting-study-leaky-forms/55401/4

DL6ER commented 2 years ago

I do agree that this should be changed in your script. PRAGMA foreign_keys=ON; will eventually become an issue in the future as the SQLite3 maintainers consider to make foreign key enforcement the new default in a future release:

Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default [...].

(https://www.sqlite.org/foreignkeys.html)

And in your case, it isn't even necessary to deal with foreign keys at all. Just change the order of the commands to ensure you are not referencing something which does not exist at this time. Create parents before children (please follow @Bucking-Horn's link for a clear explanation). It'd be better if the script is already prepared for this (especially because the fix is simple and easy).

philipp-classen commented 2 years ago

Looked at, but it is unfortunately it is hard to change the order. It is not that we generate the SQL directly; it comes indirectly by dumping an already existing sqlite database with https://pypi.org/project/sqlite-dump/

The code does something like that:

import sqlite3
from sqlite_dump import iterdump

conn = sqlite3.connect("./test.db")
for line in iterdump(conn):
    print(line)

I'm currently not aware of an elegant way to fix the order without rewriting the whole exporter.

For the moment, the best workaround that I can see is to import it with PRAGMA foreign_keys=off; if you are hit by the problem.

philipp-classen commented 1 year ago

We recently opened all the data: https://github.com/ghostery/trackerdb. It's now the recommended place to start and will replace the trackerdb.sql file in this repository (https://github.com/whotracksme/whotracks.me/issues/315).

In trackerdb releases, we export the data in different formats. There is an sqlite binary dump (trackerdb.db), but perhaps the JSON representation would be easier to process (trackerdb.json).

You can find the latest releases here: https://github.com/ghostery/trackerdb/releases

@DL6ER @Bucking-Horn @jpgpi250 If you have any feedback, please let us know (either here or open a ticket on https://github.com/ghostery/trackerdb/issues).