Issue: With RBS limit of 256MBPS, the max RBS write throughput on the newly added node was 200MB (After setting the remote_bootstrap_idle_timeout_ms to 5 minutes).
It is observed that with RBS limit set to 0, the RBS write throughput scales linearly from 135MBps > 270MBps >420MBps with 1RBS,2RBS and 3RBS respectively. So the expectation is since the master's loadbalancer gflags are set such that they are not throttling the RBS throughput the RBS writes on newly added node should be at the rate set by RBS rate limiter(remote_bootstrap_rate_limit_bytes_per_sec).
Details of two tests showcasing above issue:
Test 1 with the remote_bootstrap_idle_timeout_ms was the default value. :
If the RBS rate limiter(remote_bootstrap_rate_limit_bytes_per_sec) is set at 256MBps, the expectation was that the write throughput on newly added nodes should be around ~256MBps but the actual observed RBS write throughput is 115MBps for 55min + 190MBps for 15min at end.
Updating the defect with the cause of over-throttling resulting in 115MBps for 55min + 190MBps for 15min
Test 2 with the remote_bootstrap_idle_timeout_ms was set to 5min :
After setting the remote_bootstrap_idle_timeout_ms on each node to 5min saw the incoming RBS write throughput of ~200MBps on newly added node and the data load-balancing completed in 48min.
The over-throttling due to expired sessions seems to be getting addressed by reducing the RBS idle timeouts. Though we are still over-throttling the RBS write throughput by ~50MBps in this case.
Observing ~200MBps disk reads and network received bytes on N4.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
[X] I confirm this issue does not contain any sensitive information.
Jira Link: DB-12920
Description
Issue: With RBS limit of 256MBPS, the max RBS write throughput on the newly added node was 200MB (After setting the remote_bootstrap_idle_timeout_ms to 5 minutes).
It is observed that with RBS limit set to 0, the RBS write throughput scales linearly from 135MBps > 270MBps >420MBps with 1RBS,2RBS and 3RBS respectively. So the expectation is since the master's loadbalancer gflags are set such that they are not throttling the RBS throughput the RBS writes on newly added node should be at the rate set by RBS rate limiter(remote_bootstrap_rate_limit_bytes_per_sec).
Details of two tests showcasing above issue:
If the RBS rate limiter(remote_bootstrap_rate_limit_bytes_per_sec) is set at 256MBps, the expectation was that the write throughput on newly added nodes should be around ~256MBps but the actual observed RBS write throughput is 115MBps for 55min + 190MBps for 15min at end.
Query used: (sum by(exported_instance)(rate({ node_prefix="$dbcluster",saved_name=~"proxy_response_bytes_yb_tserver_RemoteBootstrapService_FetchData"}[60s])))
The cluster was scaleup from 3 to 4 nodes YB version: 2.23.1.0-b195 Disk throughput: 600MBps
Yb logs at location
Updating the defect with the cause of over-throttling resulting in 115MBps for 55min + 190MBps for 15min
After setting the remote_bootstrap_idle_timeout_ms on each node to 5min saw the incoming RBS write throughput of ~200MBps on newly added node and the data load-balancing completed in 48min.
The over-throttling due to expired sessions seems to be getting addressed by reducing the RBS idle timeouts. Though we are still over-throttling the RBS write throughput by ~50MBps in this case.
Observing ~200MBps disk reads and network received bytes on N4.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information