Open neolinsu opened 1 year ago
Thanks! Indeed, with insufficient resources the client itself can become the bottleneck. We typically run the load generator with spinning kthreads - see here and many cores. When one client is insufficient to generate load, we typically use multiple machines. What are the details of your machine, and what configurations are you using for your client?
The cpu is Intel Xeon 2.20GHz with 20 Hyper-Threadings (10 Phy cores), which are set to performance mode. The network is 100Gb RDMA. The configuration I use for both two clients is the same as (replace some useless info):
host_addr 10.100.100.103
host_netmask 255.255.255.0
host_gateway 10.100.100.1
runtime_kthreads 16
runtime_guaranteed_kthreads 16
runtime_spinning_kthreads 16
host_mac X
disable_watchdog true
runtime_qdelay_us 10
runtime_priority lc
static_arp 10.100.100.102 X
static_arp 10.100.100.103 X
enable_directpath fs
directpath_pci X
I also notice that even client stresses at a low throughput (like 0.75 M) where resources should be sufficient, Never Send rate is still above 1%.
Can you post the output of a client here (and the parameters used to launch it)? Looking at some recent runs I see that even at 1MPPS my never sent rate is < .1%
Here is example to run 0.8M throughput.
synthetic --config synthetic.config 10.100.100.102:5190 --output=buckets --protocol memcached --mode runtime-client --threads 16 --runtime 32 --barrier-peers 1 --barrier-leader node151 --distribution=exponential --mpps=0.8 --samples=1 --transport tcp --nvalues=3200000
And synthetic
's result is:
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
exponential, 788411, 788411, 0, 326090, 13.6, 22.6, 33.2, 37.3, 37.9, 0, 8510673237596225
Hm, that is quite high. Can you post a log with many samples at lower loads (change the above command to --samples 20). Also can you try reducing the number of kthreads to 8 and see if that has any impact?
Can you post the output of a client here (and the parameters used to launch it)? Looking at some recent runs I see that even at 1MPPS my never sent rate is < .1%
I run server with
runtime_kthreads 4
runtime_guaranteed_kthreads 0
runtime_spinning_kthreads 0
It makes the cores mwait
.
Would you plz share your configuration of server?
The server had 20 kthreads (20 guaranteed, 0 spinning). Does varying the server configuration impact the client behavior here?
The server had 20 kthreads (20 guaranteed, 0 spinning). Does varying the server configuration impact the client behavior here?
Yes. I think 20 kthreads can handle 1M pps.
You can try my configuration.
The point here is not how many guaranteed kthreads the Caladan server used. Instead, given the number of guaranteed kthreads (say 4 cores), we send client requests at a rate that is close (but lower) to the maximum capacity that the Caladan server can handle (say 1 Mpps), in this setup, no matter how many physical cores are used at client machines (even one core per connection), Never-Send rate is always high. As a result, the generated requests exhibit a distribution which is less bursty as expected.
W/ our modified clients (disable scheduling and let the softirq process one packet at a time), Never-Send rate is low. At this time, the generated requests follow a distribution that is more consistent to a Poisson distribution, but Caladan’s P999 latency becomes much higher.
Does this behavior change if you use many connections to the server? Say 100?
I'm trying to understand where the source of the delay is coming from that is causing so many never-sent packets. Please correct me if I am wrong in understanding the scenario here: the server machine is being tested at a load point close to its peak throughput. The client process/machine is not at full utilization and is not a bottleneck. Does this seem correct?
I'm trying to understand where the source of the delay is coming from that is causing so many never-sent packets. Please correct me if I am wrong in understanding the scenario here: the server machine is being tested at a load point close to its peak throughput. The client process/machine is not at full utilization and is not a bottleneck. Does this seem correct?
Yes, this is correct
Does this behavior change if you use many connections to the server? Say 100?
It seems the Never Send rate becomes higher as # of connections grows.
I'd be interested in trying to reproduce these results since they generally don't match what I've seen in my setup so far. Can you provide me the commit numbers that you are running on for caladan and mecached, the configuration files for both clients and server, the launch parameters and output logs for the iokernel, memcached, and the loadgen instances?
caladan-all: 37a3822be053c37275f0aefea60da26246fd01cb
synthetic --config synthetic.config 10.100.100.102:5190 --output=buckets --protocol memcached --mode runtime-client --threads 16 --runtime 32 --barrier-peers 1 --barrier-leader node151 --distribution=exponential --mpps=0.8 --samples=1 --transport tcp --nvalues=3200000
Configuration
host_addr 10.100.100.103
host_netmask 255.255.255.0
host_gateway 10.100.100.1
runtime_kthreads 16
runtime_guaranteed_kthreads 16
runtime_spinning_kthreads 16
host_mac X
disable_watchdog true
runtime_qdelay_us 10
runtime_priority lc
static_arp 10.100.100.102 X
static_arp 10.100.100.103 X
enable_directpath fs
directpath_pci X
cmd
memcached memcached.config -t 16 -U 5190 -p 5190 -c 32768 -m 32000 -b 32768 -o hashpower=25,no_hashexpand,lru_crawler,lru_maintainer,idle_timeout=0,slab_reassign
Configuration
host_addr 10.100.100.102
host_netmask 255.255.255.0
host_gateway 10.100.100.1
runtime_kthreads 4
runtime_guaranteed_kthreads 4
host_mac X
disable_watchdog true
runtime_qdelay_us 10
runtime_priority lc
static_arp 10.100.100.102 X
static_arp 10.100.100.103 X
enable_directpath fs
directpath_pci X
ioslcpus
and nohz_full
lists in the boot-up cmd. ias
policy.I'd be interested in trying to reproduce these results since they generally don't match what I've seen in my setup so far.
Would you like to share your configuration and result? Specially, the Never Send when the request rate is close to the maximum capacity that the Caladan server can handle.
Can you also share the outputs/logs from the various programs that you've launched? Also, caladan-all @ 37a3822b points to caladan @ 4a254bf, though I see some of your configurations imply a later version of caladan (ie using the directpath_pci config etc). Can you please confirm the version that you are running, and whether there are any modifications that are made to it?
Can you also share the outputs/logs from the various programs that you've launched? Also, caladan-all @ 37a3822b points to caladan @ 4a254bf, though I see some of your configurations imply a later version of caladan (ie using the directpath_pci config etc). Can you please confirm the version that you are running, and whether there are any modifications that are made to it?
We use Caladan @ 1ab79505 and memcached from caladan-all @ 37a3822b.
Hi all,
I find the
synthetic
inCaladan
endures a high Never-Send rate (above 1%) when clients issue requests with a relatively high rate which is close to server’s capacity. This is especially problematic under a Poisson distribution: when two adjacent requests are generated within a short time window (i.e., a bursty period), the latter one is more likely to be droped due to the Never-Send logic (see code). We have profiled Caladan's client logic, and find that the scheduling often causes the request to be delayed (which has already violated the Poisson distribution) and finally be dropped.We further designed an experiment to confirm this. We modify the Caladan client with the scheduling policy disabled: specifically, workers are bound to different cores, which can execute
send
,do_softirq
(directpath),handle_timeout
, andrecv
in cycles without yielding. We equip Caladan server with 4 kthreads and launch 16 client workers (each of which owns a TCP connection) to generate requests w/ a Poisson distribution and vary the request rate (last for 32 seconds). The following table shows the experiment result: