zonemaster / zonemaster-backend

The Zonemaster Backend - part of the Zonemaster project
Other
14 stars 23 forks source link

New result entries table #1092

Closed ghost closed 12 months ago

ghost commented 1 year ago

Purpose

Move the test_results.results json array into a dedicated table result_entries.

Context

Replaces #856

Changes

How to test this PR

Credits to @blacksponge for most of the work.

matsduf commented 1 year ago

Isn't i better to store the hash_id as number?

ghost commented 1 year ago

Isn't i better to store the hash_id as number?

TBH I've never thought about that. This uses the same datatype as in the test_results table. Maybe we could consider updating the hash_id datatype.

matsduf commented 1 year ago

Both Mysql (Mariadb) and Postgresql have good support for JSON so how much gain is it? Are there any performance differences in saving the result in a table instead of in JSON? Are there any performance differences when reading the result?

matsduf commented 1 year ago

@pnax, have estimated how time it will take to migrade the database in the zonemster.net server?

ghost commented 1 year ago

have estimated how time it will take to migrade the database in the zonemster.net server?

Yes but I can't find the figure again. If I recall correctly, this was around 2 to 3 hours.

matsduf commented 1 year ago

Are there any performance differences when reading the result?

@matsstralbergiis will try to measure the difference.

@pnax, have you compared load of writing with old and new code?

ghost commented 1 year ago

Both Mysql (Mariadb) and Postgresql have good support for JSON so how much gain is it?

See https://github.com/zonemaster/zonemaster-backend/pull/856#issuecomment-1457814137 and https://github.com/zonemaster/zonemaster-backend/pull/1092#pullrequestreview-1526628097. Another gain would be to ease the use of another database engine (for instance Clickhouse, see https://github.com/zonemaster/zonemaster-backend/pull/1094).

Regarding the performance gain, I need to redo some tests. (I thought I had done some a few months back but I can't find my results anymore.)

matsduf commented 1 year ago

See #856 (comment) and #1092 (review). Another gain would be to ease the use of another database engine (for instance Clickhouse, see #1094).

The comment in #856 was mine, but today I am better informed and understand that it actually works to let the database engine look into JSON. But it still might be a good idea to use a separate table. It is conceptually simpler. And possibly better support for other engines.

Regarding the performance gain, I need to redo some tests. (I thought I had done some a few months back but I can't find my results anymore.)

Thanks!

matsstralbergiis commented 1 year ago

I have copied the 'test_result' table to a MySQL instance running in my Mac. The database contains 208 149 tests.

Then I created the new 'result_entries' table (I copied the create command from the MySQL.pm in this PR.) and extracted the data from the JSON blob in the 'test_result' table into the nwe table. It now contains 14 518 570 records.

After that I performed two queries that I believe gets the same result. (batch 10 contains 81689 tests)

One to the 'old' format:

SELECT tr.batch_id, tr.hash_id, tr.domain, result.tag, count(result.tag), result.level, result.module, result.testcase FROM    test_results tr,   JSON_TABLE(     CAST(CONVERT(results USING utf8) AS JSON),     "$[*]" COLUMNS(        tag VARCHAR(40) PATH '$.tag' ERROR ON ERROR,       level VARCHAR(10) PATH '$.level' ERROR ON ERROR,       module VARCHAR(20) PATH '$.module' ERROR ON ERROR,       testcase VARCHAR(20) PATH '$.testcase' ERROR ON ERROR     )   ) as result WHERE batch_id = 10 GROUP BY tr.hash_id, tr.domain, result.tag, result.level, result.module, result.testcase ORDER BY tr.hash_id limit 10;
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
| batch_id | hash_id          | domain               | tag                        | count(result.tag) | level | module       | testcase       |
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | ADDRESSES_MATCH            |                 1 | INFO  | CONSISTENCY  | CONSISTENCY05  |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | ARE_AUTHORITATIVE          |                 1 | INFO  | DELEGATION   | DELEGATION04   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | AXFR_FAILURE               |                 8 | INFO  | NAMESERVER   | NAMESERVER03   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B01_CHILD_FOUND            |                 1 | INFO  | BASIC        | BASIC01        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B01_PARENT_FOUND           |                 1 | INFO  | BASIC        | BASIC01        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B02_AUTH_RESPONSE_SOA      |                 1 | INFO  | BASIC        | BASIC02        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CAN_BE_RESOLVED            |                 1 | INFO  | NAMESERVER   | NAMESERVER06   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CASE_QUERIES_RESULTS_OK    |                 1 | INFO  | NAMESERVER   | NAMESERVER09   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CHILD_DISTINCT_NS_IP       |                 1 | INFO  | DELEGATION   | DELEGATION02   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CN04_IPV4_DIFFERENT_PREFIX |                 1 | INFO  | CONNECTIVITY | CONNECTIVITY04 |
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
10 rows in set, 1 warning (52.8186 sec)

And one to the new format:

SELECT tr.batch_id, tr.hash_id, tr.domain, result.tag, count(result.tag), result.level, result.module, result.testcase FROM result_entries result LEFT JOIN test_results tr ON result.hash_id = tr.hash_id WHERE batch_id = 10 GROUP BY tr.hash_id, tr.domain, result.tag, result.level, result.module, result.testcase ORDER BY tr.hash_id limit 10;
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
| batch_id | hash_id          | domain               | tag                        | count(result.tag) | level | module       | testcase       |
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | ADDRESSES_MATCH            |                 1 | INFO  | CONSISTENCY  | CONSISTENCY05  |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | ARE_AUTHORITATIVE          |                 1 | INFO  | DELEGATION   | DELEGATION04   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | AXFR_FAILURE               |                 8 | INFO  | NAMESERVER   | NAMESERVER03   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B01_CHILD_FOUND            |                 1 | INFO  | BASIC        | BASIC01        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B01_PARENT_FOUND           |                 1 | INFO  | BASIC        | BASIC01        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | B02_AUTH_RESPONSE_SOA      |                 1 | INFO  | BASIC        | BASIC02        |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CAN_BE_RESOLVED            |                 1 | INFO  | NAMESERVER   | NAMESERVER06   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CASE_QUERIES_RESULTS_OK    |                 1 | INFO  | NAMESERVER   | NAMESERVER09   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CHILD_DISTINCT_NS_IP       |                 1 | INFO  | DELEGATION   | DELEGATION02   |
|       10 | 000081efd2d471e0 | xn--nsbyslott-v2a.se | CN04_IPV4_DIFFERENT_PREFIX |                 1 | INFO  | CONNECTIVITY | CONNECTIVITY04 |
+----------+------------------+----------------------+----------------------------+-------------------+-------+--------------+----------------+
10 rows in set (25.3378 sec)

The results are consistent when repeating the queries.

Maybe it is possible to optimize the queries to speed it up. Feel free to suggest improvements.

matsduf commented 1 year ago

If databases are better at handling integer keys, wouldn't it be better to convert the hash_id into an integer instead?

marc-vanderwal commented 1 year ago

Any x86_64 CPU with the SSE4.2 instruction set extensions (and running a DBMS that has been compiled to support these instructions) can compare strings with the same efficiency as integers (and strings can be seen as arrays of large integers, in a way). It seems to me that discussing the data type of primary keys is a case of premature optimization. There is evidence showing that, for MySQL at least, the performance penalty is quite negligible. I suggest we keep the primary key as VARCHAR(16) for now.

ghost commented 1 year ago

Please re-review.

ghost commented 1 year ago

~TODO: Travis is failing with the following error on Ubuntu focal (20.04):~

DBD::mysql::db do failed: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF EXISTS result_entries DROP FOREIGN KEY IF EXISTS fk_hash_id' at line 1 at /home/travis/build/zonemaster/zonemaster-backend/build_dir/blib/lib/Zonemaster/Backend/DB/MySQL.pm line 221. https://app.travis-ci.com/github/zonemaster/zonemaster-backend/jobs/610602246

edit: fixed with commit "Support MariaDB <10.4 and MySQL" (7ead49faee6a722a69e10da05f0a2472f816fa64)

matsduf commented 1 year ago

Is field "hash_id" in table "result_entries" the same "hash_id" that identifies the test? If so, does that mean that the table has no unique identifier?

matsduf commented 1 year ago

Do you have an estimation of how long time it will take to patch the zonemaster.net database?

ghost commented 1 year ago

Is field "hash_id" in table "result_entries" the same "hash_id" that identifies the test?

Yes.

If so, does that mean that the table has no unique identifier?

Yes, there will be several rows with the same hash_id in this table (one per msg_tag)

ghost commented 1 year ago

Do you have an estimation of how long time it will take to patch the zonemaster.net database?

I ran the migration script on a copy of the current database (zonemaster.net). It took around 4 hours 15 minutes on my setup.

Some figures on the database date: 2023-09-27 database: PostgreSQL number of tests: 1886151 db size (before migration): 7.3 GB db size (after migration): 22 GB

matsduf commented 1 year ago

Do you have an estimation of how long time it will take to patch the zonemaster.net database?

I ran the migration script on a copy of the current database (zonemaster.net). It took around 4 hours 15 minutes on my setup.

Thanks. That sounds manageable.

ghost commented 1 year ago

I've rebased on top of develop to integrate the changes from #1121 and address a comment from @mattias-p. I think I should have then addressed all comments/remarks. Please re-review.

mattias-p commented 10 months ago

v2023.2 Release Testing

The "How to test this PR" section is not very explicit and doesn't concern itself with the database migration at all, so I made up the following procedure.

  1. Create dist tarballs for Zonemaster-LDNS, Zonemaster-Engine and Zonemaster-Backend from both the master branch and the develop branch and place them in the $HOME/old and $HOME/new directories.
  2. Using the master branch dist tarballs, follow the installation instructions for Engine and Backend up to but excluding the "Database engine installation" section.
  3. Set age_reuse_previous_test = 60 for good measure.
  4. Take a snapshot of the machine and name it "old-before-db" (to be restored in later steps).
  5. For each DB engine as $db:
    1. Complete Backend installation using $db.
    2. Run zmtest nu > $HOME/$db-nu.old.result and save the test id as $test_id_nu.
    3. Run zmtest SE. > $HOME/$db-se.old.result and save the test id as $test_id_se.
    4. Take a snapshot of the machine and name it "old-$db" (in case we need get back here).
    5. Run sudo systemctl stop zm-rpcapi zm-testagent.
    6. Run sudo cpanm --notest $HOME/new/*.
    7. Run cd $(perl -MFile::ShareDir=dist_dir -E 'say dist_dir("Zonemaster-Backend")').
    8. Run sudo perl patch/patch_db_zonemaster_backend_ver_11.0.3.pl`.
    9. Run sudo systemctl start zm-rpcapi zm-testagent.
    10. Run zmb get_test_results --lang en --test-id $test_id_nu | jq -S > $HOME/$db-nu.migrated.result.
    11. Run zmb get_test_results --lang en --test-id $test_id_se | jq -S > $HOME/$db-se.migrated.result.
    12. Run zmtest nu > $HOME/$db-nu.new.result.
    13. Run zmtest SE. > $HOME/$db-se.new.result.
    14. Verify that the $HOME/$db-nu.* files are essentially identical.
    15. Verify that the $HOME/$db-se.* files are essentially identical.
    16. Take a snapshot of the machine and name it "new-$db" (in case we need get back here).
    17. Restore the "old-before-db" snapshot.

When following this procedure I found a couple of problems.

  1. For SQLite < 3.32.0 the current version of the schema migration script (develop) is only able to migrate tests with 142 messages or fewer. This is an unreasonably small limit. For SQLite > 3.32.0 the limit is 4680. On Rocky Linux 8.9 we're using SQLite 3.26.0.

    I'm not sure if 4680 is a suffient limit. The limit could be set dynamically to allow any number of messages, that isn't high for SQLite itself, of course. You can set the limit something like this: $dbh->sqlite_limit( DBD::SQLite::Constants::SQLITE_LIMIT_SQL_LENGTH, $new_value )

    This is bad but I don't think it's a blocker. People who want to keep their test results are probably using MariaDB or PostgreSQL anyway.

    We ought to create a tool for migrating databases from one DB engine to another within the same version of Backend.

  2. If you upgrade the database but forget to restart testagent, you get tests with empty results.

    This is bad and confusing but I don't think it's a blocker.

  3. After migrating a database from the previous schema version to the new one, get_test_results no longer reports properly translated message strings. Instead the message strings are returned in "raw" format.

    This is because the schema migration script and the RPCAPI daemon disagree on the format of the result_entries.module and result_entries.testcase colums. The daemon reads them case sensitively and expects most of their values to be capitalized. The script writes them in upper case and seems to assume they are case insensitive.

    Even though reports of migrated test results are more cryptic to the human reader there is no data loss and proper order can be restored using an additional migration script in a fix release. Or we could treat this as a blocker and fix the migration script before releasing.