scylladb / scylla-bench

43 stars 36 forks source link

fix(host-verification): switch to gocql builtin hostname verification #152

Closed dimakr closed 3 weeks ago

dimakr commented 3 weeks ago

Custom logic of providing server name as a separate scylla-bench parameter, for host verification, was introduced a few years ago. But now host verification is performed by default by the driver itself, when TLS encryption is enabled and gocql.SslOptions.EnableHostVerification flag is set.

The change switches to default host verification mechanism and removes the corresponding customization from the scylla-bench.

Fixes: https://github.com/scylladb/scylla-bench/issues/140

dimakr commented 3 weeks ago

Some testing locally for a basic cluster:

dimakr commented 3 weeks ago

Moving back to draft. After 1st round of review it seems like that we don't need to build around custom ServerNames verification logic, which was added to s-b 3y ago, and just need to properly set gocql.SslOptions.EnableHostVerification attribute.

dimakr commented 3 weeks ago

After the fix was revised/changed (so that no custom logic is used for providing server names for host verification, but builtin driver mechanism is used for that purpose, if enabled via gocql.SslOptions.EnableHostVerification), the following scenario was executed:

  1. Run s-b with host verification enabled:

    ❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification
    Configuration
    Mode:           write
    Workload:       sequential
    Timeout:        5s
    Max error number at row: 1000
    Max error number:   unlimited
    Retries:        
    number:       10
    min interval:     80ms
    max interval:     1s
    handler:      sb
    Consistency level:  quorum
    Partition count:    10
    Clustering rows:    100
    Clustering row size:    Fixed(5120)
    Rows per request:   10
    Page size:      1000
    Concurrency:        7
    Connections:        4
    Maximum rate:       300 op/s
    Client compression: true
    Hdr memory consumption: 2295664 bytes
    
    time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
    Results
    Time (avg): 749.34911ms
    Total ops:  100
    Total rows: 1000
    Operations/s:   129.432680225321
    Rows/s:     1294.3268022532102
    raw latency :
    max:      99.352575ms 
    99.9th:   99.352575ms 
    ...
  2. Re-generate a node certificate, so that node name/IP in its SAN extension do not correspond to real ones:
    • prepare cnf file with changed SAN extension data
      
      [req]
      default_bits = 4096
      prompt = no
      default_md = sha256
      distinguished_name = dn
      req_extensions = req_ext

[dn] CN = PR-provision-test-dmitriy-db-node-c048f7e9-3 O = ScyllaDB L = Herzelia ST = Tel Aviv C = IL

[req_ext] subjectAltName = @alt_names basicConstraints = critical,CA:FALSE subjectKeyIdentifier = hash

[alt_names] DNS.1 = PR-provision-test-dmitriy-db-node-c048f7e9-111 DNS.2 = ec2-52-51-61-111.eu-west-1.compute.amazonaws.com DNS.3 = ip-10-4-3-111.eu-west-1.compute.internal IP.1 = 10.4.3.111 IP.2 = 52.51.61.111

- generate new cert and sign it with CA

❯ openssl req -new -key ~/Downloads/ssl_conf/10.4.3.4/client-facing.key -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -config ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf ❯ openssl x509 -req -in ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -CA ~/Downloads/ssl_conf/ca.pem -CAkey ~/Downloads/ssl_conf/ca.key -CAcreateserial -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.crt -days 365 -extfile ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf -extensions req_ext

Certificate request self-signature ok subject=CN = PR-provision-test-dmitriy-db-node-c048f7e9-3, O = ScyllaDB, L = Herzelia, ST = Tel Aviv, C = IL Enter pass phrase for /home/dmitriy/Downloads/ssl_conf/ca.key: ...

3. Put the new certificate on the node in `/etc/scylla/ssl_conf` directory
4. Re-run the s-b with host verification enabled against the node. TLS handshake should fail as the node now presents another name in its certificate, not the one that the client is trying to connect with:

❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification 2024/11/05 14:40:09 gocql: unable to create session: unable to discover protocol version: x509: certificate is valid for 10.4.3.111, 52.51.61.111, not 52.51.61.135 exit status 1

5. Check that executing s-b without host verification is still successful:
```❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem
Configuration
Mode:           write
Workload:       sequential
Timeout:        5s
Max error number at row: 1000
Max error number:   unlimited
Retries:        
  number:       10
  min interval:     80ms
  max interval:     1s
  handler:      sb
Consistency level:  quorum
Partition count:    10
Clustering rows:    100
Clustering row size:    Fixed(5120)
Rows per request:   10
Page size:      1000
Concurrency:        7
Connections:        4
Maximum rate:       300 op/s
Client compression: true
Hdr memory consumption: 2295664 bytes

 time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
Results
Time (avg): 742.009271ms
Total ops:  100
Total rows: 1000
Operations/s:   129.25111537367033
Rows/s:     1292.5111537367031
raw latency :
  max:      89.456639ms 
  99.9th:   89.456639ms 
...
fruch commented 3 weeks ago

After the fix was revised/changed (so that no custom logic is used for providing server names for host verification, but builtin driver mechanism is used for that purpose, if enabled via gocql.SslOptions.EnableHostVerification), the following scenario was executed:

  1. Run s-b with host verification enabled:

    ❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification
    Configuration
    Mode:         write
    Workload:     sequential
    Timeout:      5s
    Max error number at row: 1000
    Max error number: unlimited
    Retries:      
    number:     10
    min interval:       80ms
    max interval:       1s
    handler:        sb
    Consistency level:    quorum
    Partition count:  10
    Clustering rows:  100
    Clustering row size:  Fixed(5120)
    Rows per request: 10
    Page size:        1000
    Concurrency:      7
    Connections:      4
    Maximum rate:     300 op/s
    Client compression:   true
    Hdr memory consumption:   2295664 bytes
    
    time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
    Results
    Time (avg):   749.34911ms
    Total ops:    100
    Total rows:   1000
    Operations/s: 129.432680225321
    Rows/s:       1294.3268022532102
    raw latency :
    max:        99.352575ms 
    99.9th: 99.352575ms 
    ...
  2. Re-generate a node certificate, so that node name/IP in its SAN extension do not correspond to real ones:
    • prepare cnf file with changed SAN extension data
      
      [req]
      default_bits = 4096
      prompt = no
      default_md = sha256
      distinguished_name = dn
      req_extensions = req_ext

[dn] CN = PR-provision-test-dmitriy-db-node-c048f7e9-3 O = ScyllaDB L = Herzelia ST = Tel Aviv C = IL

[req_ext] subjectAltName = @alt_names basicConstraints = critical,CA:FALSE subjectKeyIdentifier = hash

[alt_names] DNS.1 = PR-provision-test-dmitriy-db-node-c048f7e9-111 DNS.2 = ec2-52-51-61-111.eu-west-1.compute.amazonaws.com DNS.3 = ip-10-4-3-111.eu-west-1.compute.internal IP.1 = 10.4.3.111 IP.2 = 52.51.61.111

- generate new cert and sign it with CA

❯ openssl req -new -key ~/Downloads/ssl_conf/10.4.3.4/client-facing.key -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -config ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf ❯ openssl x509 -req -in ~/Downloads/ssl_conf/10.4.3.4/client-facing.csr -CA ~/Downloads/ssl_conf/ca.pem -CAkey ~/Downloads/ssl_conf/ca.key -CAcreateserial -out ~/Downloads/ssl_conf/10.4.3.4/client-facing.crt -days 365 -extfile ~/Downloads/ssl_conf/10.4.3.4/client-facing.cnf -extensions req_ext

Certificate request self-signature ok subject=CN = PR-provision-test-dmitriy-db-node-c048f7e9-3, O = ScyllaDB, L = Herzelia, ST = Tel Aviv, C = IL Enter pass phrase for /home/dmitriy/Downloads/ssl_conf/ca.key: ...

3. Put the new certificate on the node in `/etc/scylla/ssl_conf` directory
4. Re-run the s-b with host verification enabled against the node. TLS handshake should fail as the node now presents another name in its certificate, not the one that the client is trying to connect with:

❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem -tls-host-verification 2024/11/05 14:40:09 gocql: unable to create session: unable to discover protocol version: x509: certificate is valid for 10.4.3.111, 52.51.61.111, not 52.51.61.135 exit status 1

5. Check that executing s-b without host verification is still successful:
```❯ go run . -workload=sequential -mode=write -max-rate=300 -replication-factor=3 -partition-count=10 -clustering-row-count=100 -clustering-row-size=5120 -concurrency=7 -rows-per-request=10 -error-at-row-limit 1000 -nodes 52.51.61.135 -tls -tls-ca-cert-file ~/Downloads/ssl_conf/ca.pem
Configuration
Mode:         write
Workload:     sequential
Timeout:      5s
Max error number at row: 1000
Max error number: unlimited
Retries:      
  number:     10
  min interval:       80ms
  max interval:       1s
  handler:        sb
Consistency level:    quorum
Partition count:  10
Clustering rows:  100
Clustering row size:  Fixed(5120)
Rows per request: 10
Page size:        1000
Concurrency:      7
Connections:      4
Maximum rate:     300 op/s
Client compression:   true
Hdr memory consumption:   2295664 bytes

 time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
Results
Time (avg):   742.009271ms
Total ops:    100
Total rows:   1000
Operations/s: 129.25111537367033
Rows/s:       1292.5111537367031
raw latency :
  max:        89.456639ms 
  99.9th: 89.456639ms 
...

Great summary, less code :)

This is the way