Closed dimakr closed 3 weeks ago
Some testing locally for a basic cluster:
positive case
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 108.129.139.94,54.75.74.66,34.244.194.177 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification -tls-server-name 108.129.139.94,54.75.74.66,34.244.194.177
Configuration
Mode: write
Workload: sequential
Timeout: 5s
Max error number at row: 1000
Max error number: unlimited
Retries:
number: 10
min interval: 80ms
max interval: 1s
handler: sb
Consistency level: quorum
Partition count: 10
Clustering rows: 100
Clustering row size: Fixed(5120)
Rows per request: 10
Page size: 1000
Concurrency: 7
Connections: 4
Maximum rate: 300 op/s
Client compression: true
Hdr memory consumption: 2295664 bytes
time ops/s rows/s errors max 99.9th 99th 95th 90th median mean
Results
Time (avg): 737.302131ms
Total ops: 100
Total rows: 1000
...
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 108.129.139.94,54.75.74.66,34.244.194.177 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification
2024/11/04 12:44:02 tls-server-name is required when tls-host-verification is enabled
exit status 1
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 108.129.139.94,54.75.74.66,34.244.194.177 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification -tls-server-name 108.129.139.94,54.75.74.66
2024/11/04 12:44:17 Number of server names for hostname verification (2) does not match number of nodes (3)
exit status 1
tls-host-verification is enabled, but invalid server name is passed
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 108.129.139.94,54.75.74.66,34.244.194.177 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification -tls-server-name 108.129.139.94,54.75.74.66,111.111.194.177
2024/11/04 13:02:30 gocql: unable to dial control conn 34.244.194.177:9042: certificate is not valid for any of the server names
Configuration
Mode: write
Workload: sequential
Timeout: 5s
Max error number at row: 1000
Max error number: unlimited
Retries:
number: 10
min interval: 80ms
max interval: 1s
handler: sb
Consistency level: quorum
Partition count: 10
Clustering rows: 100
Clustering row size: Fixed(5120)
Rows per request: 10
Page size: 1000
Concurrency: 7
Connections: 4
Maximum rate: 300 op/s
Client compression: true
Hdr memory consumption: 2295664 bytes
time ops/s rows/s errors max 99.9th 99th 95th 90th median mean
Results
Time (avg): 670.218226ms
Total ops: 100
Total rows: 1000
Moving back to draft.
After 1st round of review it seems like that we don't need to build around custom ServerNames verification logic, which was added to s-b 3y ago, and just need to properly set gocql.SslOptions.EnableHostVerification
attribute.
After the fix was revised/changed (so that no custom logic is used for providing server names for host verification, but builtin driver mechanism is used for that purpose, if enabled via gocql.SslOptions.EnableHostVerification
), the following scenario was executed:
Run s-b with host verification enabled:
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification
Configuration
Mode: write
Workload: sequential
Timeout: 5s
Max error number at row: 1000
Max error number: unlimited
Retries:
number: 10
min interval: 80ms
max interval: 1s
handler: sb
Consistency level: quorum
Partition count: 10
Clustering rows: 100
Clustering row size: Fixed(5120)
Rows per request: 10
Page size: 1000
Concurrency: 7
Connections: 4
Maximum rate: 300 op/s
Client compression: true
Hdr memory consumption: 2295664 bytes
time ops/s rows/s errors max 99.9th 99th 95th 90th median mean
Results
Time (avg): 749.34911ms
Total ops: 100
Total rows: 1000
Operations/s: 129.432680225321
Rows/s: 1294.3268022532102
raw latency :
max: 99.352575ms
99.9th: 99.352575ms
...
[req]
default_bits = 4096
prompt = no
default_md = sha256
distinguished_name = dn
req_extensions = req_ext
[dn] CN = PR-provision-test-dmitriy-db-node-c048f7e9-3 O = ScyllaDB L = Herzelia ST = Tel Aviv C = IL
[req_ext] subjectAltName = @alt_names basicConstraints = critical,CA:FALSE subjectKeyIdentifier = hash
[alt_names] DNS.1 = PR-provision-test-dmitriy-db-node-c048f7e9-111 DNS.2 = ec2-52-51-61-111.eu-west-1.compute.amazonaws.com DNS.3 = ip-10-4-3-111.eu-west-1.compute.internal IP.1 = 10.4.3.111 IP.2 = 52.51.61.111
- generate new cert and sign it with CA
❯ openssl req -new -key ~/Downloads/ssl_conf/10.4.3.4/client-facing.key -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -config ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf ❯ openssl x509 -req -in ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -CA ~/Downloads/ssl_conf/ca.pem -CAkey ~/Downloads/ssl_conf/ca.key -CAcreateserial -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.crt -days 365 -extfile ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf -extensions req_ext
Certificate request self-signature ok subject=CN = PR-provision-test-dmitriy-db-node-c048f7e9-3, O = ScyllaDB, L = Herzelia, ST = Tel Aviv, C = IL Enter pass phrase for /home/dmitriy/Downloads/ssl_conf/ca.key: ...
3. Put the new certificate on the node in `/etc/scylla/ssl_conf` directory
4. Re-run the s-b with host verification enabled against the node. TLS handshake should fail as the node now presents another name in its certificate, not the one that the client is trying to connect with:
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification 2024/11/05 14:40:09 gocql: unable to create session: unable to discover protocol version: x509: certificate is valid for 10.4.3.111, 52.51.61.111, not 52.51.61.135 exit status 1
5. Check that executing s-b without host verification is still successful:
```❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem
Configuration
Mode: write
Workload: sequential
Timeout: 5s
Max error number at row: 1000
Max error number: unlimited
Retries:
number: 10
min interval: 80ms
max interval: 1s
handler: sb
Consistency level: quorum
Partition count: 10
Clustering rows: 100
Clustering row size: Fixed(5120)
Rows per request: 10
Page size: 1000
Concurrency: 7
Connections: 4
Maximum rate: 300 op/s
Client compression: true
Hdr memory consumption: 2295664 bytes
time ops/s rows/s errors max 99.9th 99th 95th 90th median mean
Results
Time (avg): 742.009271ms
Total ops: 100
Total rows: 1000
Operations/s: 129.25111537367033
Rows/s: 1292.5111537367031
raw latency :
max: 89.456639ms
99.9th: 89.456639ms
...
After the fix was revised/changed (so that no custom logic is used for providing server names for host verification, but builtin driver mechanism is used for that purpose, if enabled via
gocql.SslOptions.EnableHostVerification
), the following scenario was executed:
Run s-b with host verification enabled:
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification Configuration Mode: write Workload: sequential Timeout: 5s Max error number at row: 1000 Max error number: unlimited Retries: number: 10 min interval: 80ms max interval: 1s handler: sb Consistency level: quorum Partition count: 10 Clustering rows: 100 Clustering row size: Fixed(5120) Rows per request: 10 Page size: 1000 Concurrency: 7 Connections: 4 Maximum rate: 300 op/s Client compression: true Hdr memory consumption: 2295664 bytes time ops/s rows/s errors max 99.9th 99th 95th 90th median mean Results Time (avg): 749.34911ms Total ops: 100 Total rows: 1000 Operations/s: 129.432680225321 Rows/s: 1294.3268022532102 raw latency : max: 99.352575ms 99.9th: 99.352575ms ...
- Re-generate a node certificate, so that node name/IP in its SAN extension do not correspond to real ones:
- prepare cnf file with changed SAN extension data
[req] default_bits = 4096 prompt = no default_md = sha256 distinguished_name = dn req_extensions = req_ext
[dn] CN = PR-provision-test-dmitriy-db-node-c048f7e9-3 O = ScyllaDB L = Herzelia ST = Tel Aviv C = IL
[req_ext] subjectAltName = @alt_names basicConstraints = critical,CA:FALSE subjectKeyIdentifier = hash
[alt_names] DNS.1 = PR-provision-test-dmitriy-db-node-c048f7e9-111 DNS.2 = ec2-52-51-61-111.eu-west-1.compute.amazonaws.com DNS.3 = ip-10-4-3-111.eu-west-1.compute.internal IP.1 = 10.4.3.111 IP.2 = 52.51.61.111
- generate new cert and sign it with CA
❯ openssl req -new -key ~/Downloads/ssl_conf/10.4.3.4/client-facing.key -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -config ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf ❯ openssl x509 -req -in ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -CA ~/Downloads/ssl_conf/ca.pem -CAkey ~/Downloads/ssl_conf/ca.key -CAcreateserial -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.crt -days 365 -extfile ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf -extensions req_ext
Certificate request self-signature ok subject=CN = PR-provision-test-dmitriy-db-node-c048f7e9-3, O = ScyllaDB, L = Herzelia, ST = Tel Aviv, C = IL Enter pass phrase for /home/dmitriy/Downloads/ssl_conf/ca.key: ...
3. Put the new certificate on the node in `/etc/scylla/ssl_conf` directory 4. Re-run the s-b with host verification enabled against the node. TLS handshake should fail as the node now presents another name in its certificate, not the one that the client is trying to connect with:
❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification 2024/11/05 14:40:09 gocql: unable to create session: unable to discover protocol version: x509: certificate is valid for 10.4.3.111, 52.51.61.111, not 52.51.61.135 exit status 1
5. Check that executing s-b without host verification is still successful: ```❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem Configuration Mode: write Workload: sequential Timeout: 5s Max error number at row: 1000 Max error number: unlimited Retries: number: 10 min interval: 80ms max interval: 1s handler: sb Consistency level: quorum Partition count: 10 Clustering rows: 100 Clustering row size: Fixed(5120) Rows per request: 10 Page size: 1000 Concurrency: 7 Connections: 4 Maximum rate: 300 op/s Client compression: true Hdr memory consumption: 2295664 bytes time ops/s rows/s errors max 99.9th 99th 95th 90th median mean Results Time (avg): 742.009271ms Total ops: 100 Total rows: 1000 Operations/s: 129.25111537367033 Rows/s: 1292.5111537367031 raw latency : max: 89.456639ms 99.9th: 89.456639ms ...
Great summary, less code :)
This is the way
Custom logic of providing server name as a separate scylla-bench parameter, for host verification, was introduced a few years ago. But now host verification is performed by default by the driver itself, when TLS encryption is enabled and gocql.SslOptions.EnableHostVerification flag is set.
The change switches to default host verification mechanism and removes the corresponding customization from the scylla-bench.
Fixes: https://github.com/scylladb/scylla-bench/issues/140