Closed ghost closed 9 months ago
Some installation instructions are needed.
This PR is still marked as "draft". Is it really ready for installation?
Yes. I marked it as "ready for review" and updated the description.
Release testing report – Success, no issues
Rocky Linux 9.3/Clickhouse
Merged pnax/clickhouse
in a detached head. Followed the installation instructions as updated by zonemaster/zonemaster#1228 in order to get a Clickhouse server running on the same host as zonemaster-backend. After running the “smoke test”, the test_results
table contains data, as expected.
For the unit tests, the procedure was incomplete. Unit tests needed a zonemaster_testing
database, a zonemaster_testing
user and enough privileges for the aforementioned user to touch the database. With those prerequisites out of the way, unit tests pass.
@pnax Before this is merged it should be rebased on latest develop (#1132 has now been merged), to be sure that CI passes
I’m currently testing #1121 with the experimental Clickhouse database engine still enabled and it seems that I’m hitting issues related to the way Clickhouse implements ALTER TABLE … UPDATE that don’t make me feel comfortable at all with using a Clickhouse database backend in production.
For example, I get warnings like the following:
Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.
That line 362 is the equality comparison of $progress
with 0 in the following subroutine:
sub test_state {
my ( $self, $test_id ) = @_;
my ( $progress ) = $self->dbh->selectrow_array(
q[
SELECT progress
FROM test_results
WHERE hash_id = ?
],
undef,
$test_id,
);
if ( !defined $progress ) {
die Zonemaster::Backend::Error::Internal->new( reason => 'job not found' );
}
if ( $progress == 0 ) {
return $TEST_WAITING;
}
elsif ( 0 < $progress && $progress < 100 ) {
return $TEST_RUNNING;
}
elsif ( $progress == 100 ) {
return $TEST_COMPLETED;
}
else {
die Zonemaster::Backend::Error::Internal->new( reason => 'state could not be determined' );
}
}
But these “argument isn’t numeric” errors should in theory never happen, because the code fetches exactly one row and column and the datum is, according to the database schema, an integer. How do we end up with a string?
Grepping 860b459647d9be55
across both log files shows something even more concerning: the thing that’s being compared to a number is actually the ID of a test being run by the other of the two test agents on the machine. In this example, if I order the matches by timestamp, this happens:
zm-testagent-2.log:2023-12-13T08:59:51Z [2296] [WARNING] [main] Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.
zm-testagent.log:2023-12-13T08:59:53Z [2152] [INFO] [main] Test found: 860b459647d9be55
zm-testagent-2.log:2023-12-13T08:59:54Z [2296] [WARNING] [main] Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.
zm-testagent.log:2023-12-13T09:00:21Z [2812] [INFO] [main] Test starting: 860b459647d9be55
zm-testagent.log:2023-12-13T09:05:53Z [2812] [INFO] [main] Test completed: 860b459647d9be55
zm-testagent.log:2023-12-13T09:05:54Z [2152] [NOTICE] [Zonemaster::Backend::Config] Worker process (pid 2812, testid 860b459647d9be55): Terminated with exit code 0
Due to this, many tests either crash with “state could not be determined” or “illegal transition to COMPLETED”.
It seems that Clickhouse doesn’t implement proper transaction isolation when updating a table. Yet, currently, the code does many updates to test_results
that, without proper isolation, cause Zonemaster::Backend to exhibit very surprising behavior.
Purpose
Clickhouse is a vertical database engine that appears to be quite efficient when it comes to run Zonemaster with millions of domains
Context
Changes
How to test this PR
Setup a Clickhouse server
zonemaster
database and azonemaster
user:Configure and use Zonemaster-Backend
Adapt the
share/backend_config.ini
file and usezmb
orzmtest
. You could check any results by running some SQL commands withclickhouse-client
.Run the unit tests
Unit test should work when using Clickhouse.