scylladb / scylla-rust-driver

Async CQL driver for Rust, optimized for ScyllaDB!
Apache License 2.0
562 stars 98 forks source link

Connecting takes 5 seconds - but only 200ms with cqlsh #1071

Closed erikschul closed 2 days ago

erikschul commented 3 days ago

What could be the reason that my code takes a long time to connect to 3 node cluster? (local docker compose on Linux VM (arm version) on MacBook M1).

The code is very simple:

use scylla::{Session, SessionBuilder};

#[tokio::main]
async fn main() {
    let uri = "192.168.211.128:9042".to_string();
    println!("Connecting to {} ...", uri);
    let session: Session = SessionBuilder::new().known_node(uri).build().await.unwrap();
}

time cargo run

real    0m5.255s
user    0m0.142s
sys     0m0.108s

This takes 5 seconds.

While cqlsh is just 200ms (including a query);

time cqlsh 192.168.211.128 9042 -e "SELECT * FROM system.local;"

real    0m0.190s
user    0m0.134s
sys     0m0.056s
Lorak-mmk commented 3 days ago

Obligatory question: did you use --release flag for cargo run? If not, try with it.

It is expected that setting up session will take longer that connecting with cqlsh. cqlsh connects to just one node (and I think only to one shard of it) while rust driver session needs to open a connection to each shard of each node - this takes time.

Lorak-mmk commented 3 days ago

This time may not even scale linearly if you don't have advanced shard awareness working correctly, because driver needs to open connections to random shards until it gets them all. With advanced shard awareness it can choose shards to open connections to, so it is faster.

Another possible source of slow down: during session initialization driver needs to fetch metadata - if you have a lot of keyspaces / tables / columns it may be slow. You can control it with keyspaces_to_fetch and fetch_schema_metadata values in Session config if you don't need metadata about some / all keyspaces.

Question: why do you care about session initialization time? It is typically done only once per application - doing it e.g. once per user request is a bad idea. In that case setup time is usually not that important.

nyh commented 3 days ago

It is expected that setting up session will take longer that connecting with cqlsh. cqlsh connects to just one node (and I think only to one shard of it) while rust driver session needs to open a connection to each shard of each node - this takes time.

I wonder if this connect-to-each-and-every-shard shouldn't be optional, or at least optionally lazy. It is true that if you use the driver to send thousands of commands (e.g., ingest a large amount of data), it makes sense to open all the connections up-front to get the latency penalty all at once. However, what about a tool that only wants to send a single command - why does it need to open more than one connection? Another potential problem is a client cluster boots up and then many client processes all bombard the nodes with connection requests at exactly the same time (this can theoretically cause problems like https://github.com/scylladb/scylladb/issues/18021, although I don't think we ever saw this happening in practice).

"lazy" opening of connections could mean that the driver would just open one connection needed to retrieve the topology, and from now on each time it picks a shard to connect to it might need to open the connection before sending. This "lazy" connection opening can mean larger-than-usual latency for some of the requests during the first few minutes of the application run, but a small tool that only sends one command will never even open them. Again, the "lazy" vs "up-front" opening could be optional.

I wonder how other drivers, like the Python driver (which cqlsh uses) handle this issue of when to open connections.

erikschul commented 3 days ago

Really good points. Thanks for the explanation!

I agree that it makes sense to start a connection pool in the beginning, for a production app. But as nyh says, some situations are naturally more minimal in scope and would benefit from a lazy approach. I only discovered this because the first thing I did was to write a test, and it failed because my limit is 5s per test. As mentioned, cqlsh can connect and execute a basic query in 200ms, and for a tiny test database, that seems reasonable.

The database is completely empty, so I doubt the problem is sharding etc., but I appreciate the explanation which helps to understand how the driver works, and what kinds of behavior to anticipate in production.

I'm not sure how to debug this problem though. It would be nice to activate some tracing and see what's going on in the driver.

--

Release didn't help: time cargo run --release

real    0m5.151s
user    0m0.093s
sys     0m0.049s
erikschul commented 2 days ago

This is the docker-compose.yaml I'm using:

services:
  scylla-node1:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node1
    ports:
      - "9042:9042" # CQL port
      - "9160:9160" # Thrift port
    volumes:
      - scylla_data1:/var/lib/scylla
    command: --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
    networks:
      scylla_net:
        ipv4_address: 172.21.0.11

  scylla-node2:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node2
    ports:
      - "9043:9042"
    volumes:
      - scylla_data2:/var/lib/scylla
    command: --seeds=172.21.0.11 --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
    networks:
      scylla_net:
        ipv4_address: 172.21.0.12

  scylla-node3:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node3
    ports:
      - "9044:9042"
    volumes:
      - scylla_data3:/var/lib/scylla
    command: --seeds=172.21.0.11 --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
    networks:
      scylla_net:
        ipv4_address: 172.21.0.13

volumes:
  scylla_data1:
  scylla_data2:
  scylla_data3:

networks:
  scylla_net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.21.0.0/16
Lorak-mmk commented 2 days ago

@nyh @erikschul if you want to replicate what cqlsh does, it is already possible. You can use https://docs.rs/scylla/latest/scylla/transport/session_builder/struct.GenericSessionBuilder.html#method.host_filter to only connect to a single node and set https://docs.rs/scylla/latest/scylla/transport/session/struct.SessionConfig.html#structfield.connection_pool_size to PerHost(1) in order to only connect to open a single connection to this node.

Regarding the lazy approach proposed by @nyh: I don't think any driver has such mechanism. I think its worth discussing, could you open a separate issue for it?

piodul commented 2 days ago

I'm not sure how to debug this problem though. It would be nice to activate some tracing and see what's going on in the driver.

@erikschul You can do it, the crate supports it via the tracing crate. We actually have an example in the repo which configures tracing to print the messages to the terminal (examples/logging.rs), it just connects to the cluster and creates a keyspace. Maybe you could try it out?

You need to appropriately set the RUST_LOG environment variable:

RUST_LOG=info,scylla=trace cargo run --example logging

I think @Lorak-mmk's suggestion will already help, but it would be nice to understand what went wrong here. My guess is that the session tries to connect to the "shard-aware port" (19042) but it's not exposed and it times out, then falls back to the 9042 port (default connection timeout is 5s).

piodul commented 2 days ago

Btw, we also have an option to prevent the driver from connecting to the shard-aware port: https://docs.rs/scylla/latest/scylla/transport/session_builder/struct.GenericSessionBuilder.html#method.disallow_shard_aware_port. In case of a cqlsh-like tool it can be just disabled, there is no benefit from enabling it.

nyh commented 2 days ago

Regarding the lazy approach proposed by @nyh: I don't think any driver has such mechanism. I think its worth discussing, could you open a separate issue for it?

If I understand you correctly, you're saying that the Python driver which cqlsh uses also connects up-front to all the nodes. So why is it 25 times faster (according to @erikschul's measurement)? Because it connects to each node but not to each shard? Because it doesn't connect-and-reconnect-to-try-to-match-random-shard-allocation?

By the way, it's worth checking whether even connecting to all shards really needs to take 5 seconds on a 3-node cluster. Maybe we aren't parallelizing these connection establishments as much as we could?

Lorak-mmk commented 2 days ago

Regarding the lazy approach proposed by @nyh: I don't think any driver has such mechanism. I think its worth discussing, could you open a separate issue for it?

If I understand you correctly, you're saying that the Python driver which cqlsh uses also connects up-front to all the nodes.

No, I'm not saying that. Python driver also has mechanisms similar to the ones I mentioned in my previous comment and cqlsh makes use of them to only open a single connection.

By the way, it's worth checking whether even connecting to all shards really needs to take 5 seconds on a 3-node cluster. Maybe we aren't parallelizing these connection establishments as much as we could?

Explanation written by @piodul (not exposed shard-aware port and waiting until connection timeout) sounds probable - let's wait for @erikschul to provide the traces.

mykaul commented 2 days ago

I believe cqlsh does connect to all shards.

erikschul commented 2 days ago

The conclusion is that it was missing the --broadcast-rpc-address.

Part of the problem is that there is a difference between public IP (192.168.211.128 with different ports) and private network (172.21.0.11..13).

It's the same problem when configuring a local Redis cluster. I recently had this issue using Valkey where I had to set --cluster-announce-client-ipv4 192.168.211.128 which they recently implemented to solve exactly this issue).

Adding tracing_subscriber::fmt::init(); to the Rust main() code, I get: time RUST_LOG=info,scylla=trace cargo run

Connecting to 192.168.211.128:9042 ...
2024-09-16T13:54:35.058931Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T13:54:35.060128Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T13:54:35.060160Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:35.060503Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T13:54:35.060949Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T13:54:35.061322Z TRACE scylla::transport::connection: Sending 1 requests; 58 bytes
2024-09-16T13:54:35.061506Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T13:54:35.061548Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T13:54:35.061563Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xf0e204001c50 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T13:54:35.061606Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T13:54:35.061641Z TRACE scylla::transport::connection_pool: Selecting random connection
2024-09-16T13:54:35.061657Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"
2024-09-16T13:54:35.061686Z TRACE scylla::transport::connection_pool: Available connections="192.168.211.128:9042"
2024-09-16T13:54:35.061705Z TRACE scylla::transport::connection_pool: Found connection for the target shard shard=0
2024-09-16T13:54:35.061849Z TRACE scylla::transport::connection: Sending 3 requests; 286 bytes
2024-09-16T13:54:35.063225Z DEBUG scylla::transport::topology: Toposort of UDT definitions took 0.01 ms (udts len: 0)
2024-09-16T13:54:35.064048Z TRACE scylla::transport::connection: Sending 1 requests; 114 bytes
2024-09-16T13:54:35.070579Z TRACE scylla::transport::connection: Sending 1 requests; 98 bytes
2024-09-16T13:54:35.072334Z TRACE scylla::transport::connection: Sending 1 requests; 78 bytes
2024-09-16T13:54:35.073403Z TRACE scylla::transport::connection: Sending 1 requests; 114 bytes
2024-09-16T13:54:35.078978Z TRACE scylla::transport::connection: Sending 1 requests; 98 bytes
2024-09-16T13:54:35.080798Z TRACE scylla::transport::connection: Sending 1 requests; 93 bytes
2024-09-16T13:54:35.081573Z TRACE scylla::transport::connection: Sending 1 requests; 82 bytes
2024-09-16T13:54:35.082220Z DEBUG scylla::transport::topology: Fetched new metadata
2024-09-16T13:54:35.082317Z DEBUG scylla::transport::connection_pool: [172.21.0.13:9042] Started asynchronous pool worker
2024-09-16T13:54:35.082341Z DEBUG scylla::transport::connection_pool: [172.21.0.12:9042] Started asynchronous pool worker
2024-09-16T13:54:35.082361Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T13:54:35.083509Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T13:54:35.083567Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:35.083754Z TRACE scylla::transport::connection_pool: [172.21.0.13:9042] Will open the first connection to the node
2024-09-16T13:54:35.083767Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:35.083853Z TRACE scylla::transport::connection_pool: [172.21.0.12:9042] Will open the first connection to the node
2024-09-16T13:54:35.083871Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:35.084062Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T13:54:35.084653Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T13:54:35.084865Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T13:54:35.084883Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T13:54:35.084891Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xf0e20403a440 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T13:54:35.084945Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T13:54:35.084979Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"
(5 second wait here)
2024-09-16T13:54:40.086278Z DEBUG scylla::transport::connection_pool: [172.21.0.12:9042] Failed to open connection to the non-shard-aware port: TimeoutError
2024-09-16T13:54:40.086641Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:40.086712Z DEBUG scylla::transport::connection_pool: [172.21.0.12:9042] Scheduling next refill in 100 ms
2024-09-16T13:54:40.086278Z DEBUG scylla::transport::connection_pool: [172.21.0.13:9042] Failed to open connection to the non-shard-aware port: TimeoutError
2024-09-16T13:54:40.087016Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T13:54:40.087113Z DEBUG scylla::transport::connection_pool: [172.21.0.13:9042] Scheduling next refill in 100 ms

It fails with 172.21.0.13:9042] Failed to open connection to the non-shard-aware port: TimeoutError. So it tries to connect to the private IPs (172.21.0.0).

I updated the docker-compose to both expose the shard-aware ports, and to set the broadcast-address. This solves the problem.

time RUST_LOG=info,scylla=trace cargo run

real    0m0.172s
user    0m0.111s
sys     0m0.052s

working docker-compose:

services:
  scylla-node1:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node1
    ports:
      - "9042:9042" # CQL port
      - "9160:9160" # Thrift port
      - "18001:18001"
    volumes:
      - scylla_data1:/var/lib/scylla
    command: >
      --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
      --listen-address 172.21.0.11
      --broadcast-address 172.21.0.11
      --broadcast-rpc-address 192.168.211.128
      --native-shard-aware-transport-port 18001
    networks:
      scylla_net:
        ipv4_address: 172.21.0.11
  scylla-node2:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node2
    ports:
      - "9043:9042"
      - "18002:18002"
    volumes:
      - scylla_data2:/var/lib/scylla
    command: >
      --seeds=172.21.0.11 --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
      --listen-address 172.21.0.12
      --broadcast-address 172.21.0.12
      --broadcast-rpc-address 192.168.211.128
      --native-shard-aware-transport-port 18002
    networks:
      scylla_net:
        ipv4_address: 172.21.0.12
  scylla-node3:
    image: scylladb/scylla:6.1.1
    container_name: scylla-node3
    ports:
      - "9044:9042"
      - "18003:18003"
    volumes:
      - scylla_data3:/var/lib/scylla
    command: >
      --seeds=172.21.0.11 --smp 1 --memory 750M --overprovisioned 1 --reactor-backend=epoll --api-address 0.0.0.0
      --listen-address 172.21.0.13
      --broadcast-address 172.21.0.13
      --broadcast-rpc-address 192.168.211.128
      --native-shard-aware-transport-port 18003
    networks:
      scylla_net:
        ipv4_address: 172.21.0.13
volumes:
  scylla_data1:
  scylla_data2:
  scylla_data3:
networks:
  scylla_net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.21.0.0/16

In fact it works even without the shard-aware ports exposed. What fixed it was setting --broadcast-rpc-address on each node.

Do you have some code such that I can test that the shard-aware ports configuration works? Especially something that would utilize all three nodes.

trace without shard-aware ports exposed (note: no error):

Connecting to 192.168.211.128:9042 ...
2024-09-16T15:39:55.309881Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T15:39:55.311193Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T15:39:55.311229Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T15:39:55.311648Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T15:39:55.312193Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T15:39:55.312543Z TRACE scylla::transport::connection: Sending 1 requests; 58 bytes
2024-09-16T15:39:55.313000Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T15:39:55.313027Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T15:39:55.313039Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xffecd8000ea0 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T15:39:55.313069Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T15:39:55.313092Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"
2024-09-16T15:39:55.313099Z TRACE scylla::transport::connection_pool: Selecting random connection
2024-09-16T15:39:55.313177Z TRACE scylla::transport::connection_pool: Available connections="192.168.211.128:9042"
2024-09-16T15:39:55.313188Z TRACE scylla::transport::connection_pool: Found connection for the target shard shard=0
2024-09-16T15:39:55.313380Z TRACE scylla::transport::connection: Sending 3 requests; 286 bytes
2024-09-16T15:39:55.315163Z DEBUG scylla::transport::topology: Toposort of UDT definitions took 0.01 ms (udts len: 0)
2024-09-16T15:39:55.315447Z TRACE scylla::transport::connection: Sending 1 requests; 114 bytes
2024-09-16T15:39:55.320724Z TRACE scylla::transport::connection: Sending 1 requests; 98 bytes
2024-09-16T15:39:55.322078Z TRACE scylla::transport::connection: Sending 1 requests; 78 bytes
2024-09-16T15:39:55.323227Z TRACE scylla::transport::connection: Sending 1 requests; 114 bytes
2024-09-16T15:39:55.328160Z TRACE scylla::transport::connection: Sending 1 requests; 98 bytes
2024-09-16T15:39:55.329756Z TRACE scylla::transport::connection: Sending 1 requests; 93 bytes
2024-09-16T15:39:55.330452Z TRACE scylla::transport::connection: Sending 1 requests; 82 bytes
2024-09-16T15:39:55.330957Z DEBUG scylla::transport::topology: Fetched new metadata
2024-09-16T15:39:55.331114Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T15:39:55.331150Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T15:39:55.331177Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Started asynchronous pool worker
2024-09-16T15:39:55.332312Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T15:39:55.332357Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T15:39:55.332358Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T15:39:55.332385Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T15:39:55.332386Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Will open the first connection to the node
2024-09-16T15:39:55.332427Z TRACE scylla::transport::connection_pool: pool_state="[(0, [])]"
2024-09-16T15:39:55.332699Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T15:39:55.332849Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T15:39:55.333054Z TRACE scylla::transport::connection: Sending 1 requests; 9 bytes
2024-09-16T15:39:55.333748Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T15:39:55.333780Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T15:39:55.333800Z TRACE scylla::transport::connection: Sending 1 requests; 211 bytes
2024-09-16T15:39:55.333960Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T15:39:55.334007Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T15:39:55.334019Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xffecd8045b50 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T15:39:55.334022Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T15:39:55.334038Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T15:39:55.334047Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xffeccc0040b0 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T15:39:55.334051Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T15:39:55.334062Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T15:39:55.334063Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] New sharder: Some(Sharder { nr_shards: 1, msb_ignore: 12 }), clearing all connections
2024-09-16T15:39:55.334072Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"
2024-09-16T15:39:55.334067Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"
2024-09-16T15:39:55.334088Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Updating shard aware port: Some(19042)
2024-09-16T15:39:55.334175Z TRACE scylla::transport::connection_pool: [192.168.211.128:9042] Adding connection 0xffecd4000f10 to shard 0 pool, now there are 1 for the shard, total 1
2024-09-16T15:39:55.334208Z DEBUG scylla::transport::connection_pool: [192.168.211.128:9042] Pool is full, clearing 0 excess connections
2024-09-16T15:39:55.334220Z TRACE scylla::transport::connection_pool: pool_state="[(0, [192.168.211.128:9042])]"

I assume the correct docker-compose is one that either exposes the shard-aware port. A typical production setup would have it enabled (similar to a Redis cluster where the client contacts the relevant node directly), and so local development setups should have it enabled as well, so the application code is identical.

This guide is a bit lacking, and perhaps it would be relevant to add a more complete cluster example like my docker-compose above.

Other than that, we can close this issue as a configuration error.