Closed hamedsey closed 2 years ago
Hey Hamed - can you try switching rust back to an older version by running rustup default nightly-2020-08-29
? As a side note, the XL170 machines have ConnectX-4 NICs which don't support Caladan's directpath mode, so packet rates and core scalability may be limited.
Hey Josh, thanks a lot for your response. That solved my issue!
I'm now running Caladan. I've updated the IP address, netmask, and gateway in the client/server config files.
Thanks, in advance!
hameds@node2:~/caladan$ ./apps/synthetic/target/release/synthetic 128.110.218.130:5000 --config server.config --mode spawner-server
CPU 08| <5> cpu: detected 20 cores, 1 nodes
CPU 08| <5> time: detected 2394 ticks / us
[ 0.000657] CPU 08| <5> loading configuration from 'server.config'
[ 0.000710] CPU 08| <5> cfg: provisioned 4 cores (4 guaranteed, 0 burstable, 0 spinning)
[ 0.000718] CPU 08| <5> cfg: task is latency critical (LC)
[ 0.000727] CPU 08| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.000734] CPU 08| <5> cfg: storage disabled, directpath disabled
[ 0.000745] CPU 08| <5> process pid: 5085
[ 0.029782] CPU 08| <5> net: started network stack
[ 0.029795] CPU 08| <5> net: using the following configuration:
[ 0.029798] CPU 08| <5> addr: 128.110.218.130
[ 0.029803] CPU 08| <5> netmask: 255.255.248.0
[ 0.029806] CPU 08| <5> gateway: 128.110.216.1
[ 0.029810] CPU 08| <5> mac: 66:25:F4:B7:FE:BD
[ 0.029815] CPU 08| <5> mtu: 1500
[ 0.029903] CPU 08| <5> thread: created thread 0
[ 0.029961] CPU 08| <5> spawning 4 kthreads
[ 0.030092] CPU 01| <5> thread: created thread 1
[ 0.030110] CPU 13| <5> thread: created thread 2
[ 0.030171] CPU 14| <5> thread: created thread 3
128.110.218.130:5000
hameds@node1:~/caladan$ ./apps/synthetic/target/release/synthetic 128.110.218.127:5000 --config client.config --mode runtime-client
CPU 06| <5> cpu: detected 20 cores, 1 nodes
CPU 06| <5> time: detected 2394 ticks / us
[ 0.000431] CPU 06| <5> loading configuration from 'client.config'
[ 0.000456] CPU 06| <5> cfg: provisioned 6 cores (6 guaranteed, 0 burstable, 6 spinning)
[ 0.000460] CPU 06| <5> cfg: task is latency critical (LC)
[ 0.000464] CPU 06| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.000467] CPU 06| <5> cfg: storage disabled, directpath disabled
[ 0.000473] CPU 06| <5> process pid: 5037
[ 0.039026] CPU 06| <5> net: started network stack
[ 0.039040] CPU 06| <5> net: using the following configuration:
[ 0.039043] CPU 06| <5> addr: 128.110.218.127
[ 0.039048] CPU 06| <5> netmask: 255.255.248.0
[ 0.039052] CPU 06| <5> gateway: 128.110.216.1
[ 0.039056] CPU 06| <5> mac: FA:37:F0:22:66:BB
[ 0.039061] CPU 06| <5> mtu: 1500
[ 0.039149] CPU 06| <5> thread: created thread 0
[ 0.039215] CPU 06| <5> spawning 6 kthreads
[ 0.039301] CPU 08| <5> thread: created thread 1
[ 0.039346] CPU 09| <5> thread: created thread 2
[ 0.039442] CPU 11| <5> thread: created thread 3
[ 0.039533] CPU 03| <5> thread: created thread 4
[ 0.039605] CPU 05| <5> thread: created thread 5
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
zero, 1000, 0, 0, 0, 1661352661
[ 24.542939] CPU 03| <5> init: shutting down -> SUCCESS
Great! You can actually leave the IP address/netmask/gateway as is in the sample config files (in fact this is better than re-using the IP addresses from the control interfaces on the machines).
0000:03:00.1
; so I launch the IOKernel using sudo ./iokerneld ias nicpci 0000:03:00.1
. --distribution=constant
and --mean=74
-- here 74 is the number of iterations of fake work to be performed (on these machines 74 iterations equals about 1us). Thanks Josh!! I am now able to see the client display some performance numbers.
In the run below, I'm spawning 16 threads on the client and server.
Are the 16 hyperthreads spawned by the server automatically mapped to 8 physical cores?
The client report seems to indicate that the system's throughput saturates around 1.6 M req/sec, even though I've set mpps to 4. I would like to see if I can achieve higher throughput similar to Fig. 4 in the Caladan paper. As you mentioned earlier, my current cloud lab setup doesn't support directpath. However, do you have suggestions on how I can achieve higher throughput? The figure below shows the system monitor from the server (left) and client (right) for the run with 1.6 M req/sec throughput. Only 7-8 server threads (out of 16 spawned) are active whereas all 16 client threads are actively generating load. Seems like the client is the bottleneck. How many client machines did you use to generate a load of 10 M req/sec?
hameds@node2:~/caladan$ ./apps/synthetic/target/release/synthetic 192.168.1.3:5000 --config server.config --mode spawner-server
CPU 13| <5> cpu: detected 20 cores, 1 nodes
CPU 13| <5> time: detected 2394 ticks / us
[ 0.000642] CPU 13| <5> loading configuration from 'server.config'
[ 0.000701] CPU 13| <5> cfg: provisioned 16 cores (16 guaranteed, 0 burstable, 0 spinning)
[ 0.000711] CPU 13| <5> cfg: task is latency critical (LC)
[ 0.000718] CPU 13| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.000726] CPU 13| <5> cfg: storage disabled, directpath disabled
[ 0.000737] CPU 13| <5> process pid: 10234
[ 0.106826] CPU 13| <5> net: started network stack
[ 0.106840] CPU 13| <5> net: using the following configuration:
[ 0.106843] CPU 13| <5> addr: 192.168.1.3
[ 0.106848] CPU 13| <5> netmask: 255.255.255.0
[ 0.106853] CPU 13| <5> gateway: 192.168.1.1
[ 0.106857] CPU 13| <5> mac: 8A:06:10:40:D9:21
[ 0.106861] CPU 13| <5> mtu: 1500
[ 0.106955] CPU 13| <5> thread: created thread 0
[ 0.107021] CPU 13| <5> spawning 16 kthreads
[ 0.107187] CPU 04| <5> thread: created thread 1
[ 0.107194] CPU 06| <5> thread: created thread 2
[ 0.107196] CPU 07| <5> thread: created thread 3
[ 0.107334] CPU 08| <5> thread: created thread 4
[ 0.107381] CPU 09| <5> thread: created thread 5
[ 0.107757] CPU 11| <5> thread: created thread 6
[ 0.107808] CPU 18| <5> thread: created thread 7
[ 0.107960] CPU 15| <5> thread: created thread 8
[ 0.108061] CPU 03| <5> thread: created thread 9
[ 0.108247] CPU 08| <5> thread: created thread 10
[ 0.108352] CPU 05| <5> thread: created thread 11
[ 0.108492] CPU 01| <5> thread: created thread 12
[ 0.108755] CPU 17| <5> thread: created thread 13
[ 0.109012] CPU 14| <5> thread: created thread 14
[ 0.109124] CPU 03| <5> thread: created thread 15
192.168.1.3:5000
hameds@node1:~/caladan$ ./apps/synthetic/target/release/synthetic 192.168.1.3:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --mpps=4 --runtime=10
CPU 02| <5> cpu: detected 20 cores, 1 nodes
CPU 02| <5> time: detected 2394 ticks / us
[ 0.000431] CPU 02| <5> loading configuration from 'client.config'
[ 0.000458] CPU 02| <5> cfg: provisioned 16 cores (16 guaranteed, 0 burstable, 16 spinning)
[ 0.000462] CPU 02| <5> cfg: task is latency critical (LC)
[ 0.000464] CPU 02| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.000467] CPU 02| <5> cfg: storage disabled, directpath disabled
[ 0.000472] CPU 02| <5> process pid: 11224
[ 0.101530] CPU 02| <5> net: started network stack
[ 0.101542] CPU 02| <5> net: using the following configuration:
[ 0.101545] CPU 02| <5> addr: 192.168.1.7
[ 0.101550] CPU 02| <5> netmask: 255.255.255.0
[ 0.101554] CPU 02| <5> gateway: 192.168.1.1
[ 0.101557] CPU 02| <5> mac: C2:1F:B0:00:6C:5D
[ 0.101561] CPU 02| <5> mtu: 1500
[ 0.101649] CPU 02| <5> thread: created thread 0
[ 0.101708] CPU 02| <5> spawning 16 kthreads
[ 0.101793] CPU 03| <5> thread: created thread 1
[ 0.101824] CPU 04| <5> thread: created thread 2
[ 0.101913] CPU 15| <5> thread: created thread 3
[ 0.102075] CPU 03| <5> thread: created thread 4
[ 0.102120] CPU 18| <5> thread: created thread 5
[ 0.102238] CPU 17| <5> thread: created thread 6
[ 0.102271] CPU 01| <5> thread: created thread 7
[ 0.102370] CPU 12| <5> thread: created thread 8
[ 0.102451] CPU 13| <5> thread: created thread 9
[ 0.102564] CPU 07| <5> thread: created thread 10
[ 0.102667] CPU 08| <5> thread: created thread 11
[ 0.102787] CPU 11| <5> thread: created thread 12
[ 0.102913] CPU 05| <5> thread: created thread 13
[ 0.102987] CPU 03| <5> thread: created thread 14
[ 0.103102] CPU 02| <5> thread: created thread 15
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 200242, 200242, 0, 23, 9.0, 11.0, 15.0, 104.0, 195.0, 1661437120, 8488892976361
constant, 398843, 398843, 0, 10591, 10.0, 14.0, 20.0, 138.0, 214.0, 1661437145, 1957924345548
constant, 594655, 594655, 0, 47798, 12.0, 17.0, 25.0, 178.0, 248.0, 1661437170, 7668543666128
constant, 787884, 787884, 0, 110065, 15.0, 27.0, 41.0, 186.0, 253.0, 1661437195, 3376173644552
constant, 980681, 980681, 0, 176313, 18.0, 30.0, 49.0, 189.0, 264.0, 1661437220, 578556636761
constant, 1173728, 1173728, 0, 249018, 20.0, 32.0, 47.0, 196.0, 256.0, 1661437246, 1101151269002
constant, 1355149, 1355149, 0, 420122, 20.0, 33.0, 53.0, 201.0, 271.0, 1661437272, 526967746086
constant, 1517063, 1517063, 0, 757595, 22.0, 35.0, 67.0, 207.0, 261.0, 1661437298, 1888684339641
constant, 1589174, 1589174, 0, 1925627, 20.0, 32.0, 88.0, 233.0, 329.0, 1661437324, 147035735541
constant, 1661090, 1661090, 0, 3069601, 19.0, 33.0, 132.0, 242.0, 369.0, 1661437351, 3418841388171
constant, 1671219, 1671219, 0, 4803722, 19.0, 31.0, 162.0, 366.0, 498.0, 1661437377, 331763663542
constant, 1648339, 1648164, 1571, 6822125, 19.0, 33.0, 523.0, 1534.0, inf, 1661437404, 198970081544
constant, 1646277, 1646277, 0, 8658633, 19.0, 36.0, 244.0, 1350.0, 1691.0, 1661437431, 3149584307170
constant, 1627672, 1627672, 0, 10604679, 19.0, 33.0, 206.0, 1544.0, 1827.0, 1661437458, 1332655620417
constant, 1599860, 1599860, 0, 12677087, 18.0, 30.0, 104.0, 1094.0, 1637.0, 1661437486, 2070408298798
constant, 1581728, 1581728, 0, 14651970, 19.0, 30.0, 61.0, 223.0, 289.0, 1661437513, 1303196183976
constant, 1577940, 1577940, 0, 16465333, 19.0, 30.0, 56.0, 224.0, 301.0, 1661437541, 259360418716
constant, 1539878, 1539878, 0, 18688402, 18.0, 29.0, 51.0, 212.0, 269.0, 1661437568, 1627973023522
constant, 1531083, 1531083, 0, 20510857, 21.0, 32.0, 94.0, 242.0, 389.0, 1661437596, 5386034664385
constant, 1451515, 1451515, 0, 23006373, 17.0, 26.0, 43.0, 215.0, 296.0, 1661437624, 1027633940206
[532.012734] CPU 16| <5> init: shutting down -> SUCCESS
runtime_spinning_kthreads
in the config file), which is why htop shows them as entirely busy. One option might be to explore using a different cloudlab instance with a newer network card. I also noticed that you are using the NIC at 0000:00:07.1
- is this NIC connected to the regular 25 Gbps experimental network for these nodes? Anytime I have provisioned one of these machines the correct NIC is the one at 0000:00:03.1
Thanks Josh! On these cloudlab machines, each physical core supports 2 threads. Does Caladan provide the option to limit the 16 threads spawned to 8 physical cores?
Thanks for clarifying. I'll try that out.
If you want to only schedule at most 1 thread per physical core, you can include the "noht" option when launching the IOKernel. If you want to constrain the scheduled cores in some other way, you can pass the IOKernel a comma-separated list of cores that you'd like included for scheduling. Note the IOKernel will reserve two cores for itself.
Hi Josh,
Got it, thanks! I'm trying to run Caladan on the r650 machines on cloudlab, which have the 100G ConnectX-5 NICs (and hence support for directpath).
But I'm running into this error when inserting the iokernel module.
hameds@clnode284:~/caladan$ sudo ./scripts/setup_machine.sh
kernel.shm_rmid_forced = 1
kernel.shmmax = 18446744073692774399
vm.hugetlb_shm_group = 27
vm.max_map_count = 16777216
net.core.somaxconn = 3072
rmmod: ERROR: Module ksched is not currently loaded
insmod: ERROR: could not insert module ./scripts/../ksched/build/ksched.ko: Unknown symbol in module
./scripts/setup_machine.sh: line 22: /sys/devices/system/node/node[2-9]/hugepages/hugepages-2048kB/nr_hugepages: No such file or directory
Any idea how to get around this. I had Caladan running on the xl170 machines. I have the same OS (Ubuntu 18.04.1 LTS) and kernel version (4.15.0-169-generic) on both cloudlab setups, but can't seem to figure out why this problem exists.
Thanks, Hamed
What is the output of 'cat /sys/devices/system/cpu/cpuidle/current_driver'? If it is "none" then you will need to request that the Cloudlab admins enable c-states in BIOS for your machine. You'll see other similar requests in the cloudlab-users forum.
Thanks for your suggestion. Unfortunately, enabling the c-states didn't do the trick. Do you have other suggestions I can try? Have you ever been able to run Caladan on CloudLab's r650 machines?
Yes, Caladan should run on those machines. Can you try updating your kernel to version 5.0 or later?
I tried kernel versions 5.0.0 and 5.19.1 but neither seemed to fix it. By any chance, do you have a working linux image that I can try out on CloudLab?
I'll take a look around for an image that you can use. In the mean time, what is the output of cat /sys/devices/system/cpu/cpuidle/current_driver
? Are you still seeing the "Unknown symbol in module" error? Also, when you change kernel versions, you'll need to run make
again the ksched directory.
Thanks, Josh!
cat /sys/devices/system/cpu/cpuidle/current_driver
returns "none".
And after changing kernel versions, I did run make
, but I'm still saw the "Unknown symbol in module" error.
OK. Sounds like c-states are still not enabled on your machine (the output should be something like intel_idle
, not none
). If the cloudlab admins pushed a change to the BIOS config, you may need to reboot for it to take effect. Otherwise you should check back in with them about what is going on.
Ok, thanks! There was some confusion with the CloudLab folks. They re-enabled c-states and I was able to insert the kernel module.
However, upon running the iokernel, I've stumbled upon this compatibility error. Is there a way around this?
hameds@server:~/caladan$ sudo ./iokerneld ias nicpci 0000:31:00.0
CPU 06| <5> cpu: detected 144 cores, 2 nodes
CPU 06| <5> time: detected 2394 ticks / us
[ 0.001506] CPU 06| <5> sched: CPU configuration...
node 0: [0,72][2,74][4,76][6,78][8,80][10,82][12,84][14,86][16,88][18,90][20,92][22,94][24,96][26,98][28,100][30,102][32,104][34,106][36,108][38,110][40,112][42,114][44,116][46,118][48,120][50,122][52,124][54,126][56,128][58,130][60,132][62,134][64,136][66,138][68,140][70,142]
node 1: [1,73][3,75][5,77][7,79][9,81][11,83][13,85][15,87][17,89][19,91][21,93][23,95][25,97][27,99][29,101][31,103][33,105][35,107][37,109][39,111][41,113][43,115][45,117][47,119][49,121][51,123][53,125][55,127][57,129][59,131][61,133][63,135][65,137][67,139][69,141][71,143]
[ 0.001538] CPU 06| <5> sched: dataplane on 72, control on 0
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem/Nehalem-EP, Atom(tm), Westmere/Clarkdale, Sandy Bridge, Westmere-EP, Sandy Bridge-EP/Jaketown, Nehalem-EX, Westmere-EX, unknown, Centerton, Baytrail, Ivy Bridge, Haswell, Broadwell, Ivy Bridge-EP/EN/EX/Ivytown, Haswell-EP/EN/EX, Cherrytrail, Avoton, Broadwell-EP/EX, Skylake-SP, Cascade Lake-SP, Broadwell-DE, Knights Landing, Apollo Lake, Skylake, Denverton, Icelake, Kabylake). CPU model number: 106 Brand: "Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz"
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted
Great!
To deal with that error, you can either disable the memory bandwidth monitoring (add nobw
) to the iokerneld command line, or you can update the pcm submodule to a version that supports the newest processors. We have a patch that does this on a development branch. You should be able to apply it by running git cherry-pick f7f6cf3
if you are already on main
. After applying the patch, run make submodules-clean && make submodules && make clean && make
to rebuild the submodules and the iokernel.
Thanks Josh! I'm able to run Caladan, but still not with directpath. I'm using the ConnectX-6 NICs on these r650 nodes.
My kernel version is Linux 5.4.0-125-generic
and usingMLNX_OFED_LINUX-5.0-1.0.0.0 (OFED-5.0-1.0.0.0)
, with the NIC firmware version being 22.32.2004
.
Here's my build config file with mlx5 and directpath enabled
# Enable Mellanox ConnectX-4,5 NIC Support
CONFIG_MLX5=y
# Enable Mellanox ConnectX-3 NIC Support
CONFIG_MLX4=n
# Enable SPDK NVMe support
CONFIG_SPDK=n
# Enable debug build mode (slower but enables several runtime checks)
CONFIG_DEBUG=n
# Enable additional compiler optimizations (may reduce compatibility)
CONFIG_OPTIMIZE=n
# Allow runtimes to access Mellanox ConnectX-5 NICs directly (kernel bypass)
CONFIG_DIRECTPATH=y
Here's my client runtime config file which includes enable_directpath
and a unique IP address.
host_addr 192.168.10.12
host_netmask 255.255.255.0
host_gateway 192.168.10.1
runtime_kthreads 6
runtime_spinning_kthreads 6
runtime_guaranteed_kthreads 6
runtime_priority lc
enable_directpath
Below is the output when starting iokernel.
sudo ./iokerneld ias nicpci 0000:ca:00.0 &
CPU 32| <5> cpu: detected 144 cores, 2 nodes
CPU 32| <5> time: detected 2394 ticks / us
[ 0.001532] CPU 32| <5> sched: CPU configuration...
node 0: [0,72][2,74][4,76][6,78][8,80][10,82][12,84][14,86][16,88][18,90][20,92][22,94][24,96][26,98][28,100][30,102][32,104][34,106][36,108][38,110][40,112][42,114][44,116][46,118][48,120][50,122][52,124][54,126][56,128][58,130][60,132][62,134][64,136][66,138][68,140][70,142]
node 1: [1,73][3,75][5,77][7,79][9,81][11,83][13,85][15,87][17,89][19,91][21,93][23,95][25,97][27,99][29,101][31,103][33,105][35,107][37,109][39,111][41,113][43,115][45,117][47,119][49,121][51,123][53,125][55,127][57,129][59,131][61,133][63,135][65,137][67,139][69,141][71,143]
[ 0.001561] CPU 32| <5> sched: dataplane on 72, control on 0
===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 27
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS : yes
Socket 0: 4 memory controllers detected with total number of 8 channels. 3 QPI ports detected. 4 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
Socket 1: 4 memory controllers detected with total number of 8 channels. 3 QPI ports detected. 4 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
[ 0.087622] CPU 00| <5> control: spawning control thread
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:ca:00.0 (socket 1)
common_mlx5: RTE_MEM is selected.
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: No legacy callbacks, legacy socket not created
[ 1.180859] CPU 72| <5> dpdk: driver: mlx5_pci port 0 MAC: 04 3f 72 f2 9b 62
[ 1.300201] CPU 72| <5> mlx5: device cycles / us: 1000.0000
[ 1.300211] CPU 72| <3> main: port 0 is on remote NUMA node to polling thread.
Performance will not be optimal.
[ 1.300217] CPU 72| <5> main: core 72 running dataplane. [Ctrl+C to quit]
And finally the output when launching the synthetic workload.
hameds@clnode258:~/fresh/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.1.12:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --mpps=4 --runtime=10
CPU 32| <5> cpu: detected 144 cores, 2 nodes
CPU 32| <5> time: detected 2394 ticks / us
[ 0.001532] CPU 32| <5> loading configuration from 'client.config'
Segmentation fault
Any idea what could be going wrong here?
Can you try changing the config file line to "enable_directpath 1"?
Thanks for the suggestion. Caladan works fine without enable_directpath 1
, but seems like the client and server are not able to connect when directpath is enabled.
server config file (with directpath disabled)
host_addr 192.168.10.3
host_netmask 255.255.255.0
host_gateway 192.168.10.1
runtime_kthreads 4
runtime_guaranteed_kthreads 4
runtime_priority lc
server output (with directpath disabled)
hameds@clnode262:~/fresh/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.3:5000 --config server.config --mode spawner-server
CPU 126| <5> cpu: detected 144 cores, 2 nodes
CPU 126| <5> time: detected 2394 ticks / us
[ 0.001538] CPU 126| <5> loading configuration from 'server.config'
[ 0.001563] CPU 126| <5> cfg: provisioned 4 cores (4 guaranteed, 0 burstable, 0 spinning)
[ 0.001566] CPU 126| <5> cfg: task is latency critical (LC)
[ 0.001568] CPU 126| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.001571] CPU 126| <5> cfg: storage disabled, directpath disabled
[ 0.001575] CPU 126| <5> process pid: 64948
[ 0.030345] CPU 126| <5> net: started network stack
[ 0.030356] CPU 126| <5> net: using the following configuration:
[ 0.030359] CPU 126| <5> addr: 192.168.10.3
[ 0.030362] CPU 126| <5> netmask: 255.255.255.0
[ 0.030382] CPU 126| <5> gateway: 192.168.10.1
[ 0.030383] CPU 126| <5> mac: 3E:63:00:C1:E5:06
[ 0.030387] CPU 126| <5> mtu: 1500
[ 0.030466] CPU 126| <5> thread: created thread 0
[ 0.030506] CPU 126| <5> spawning 4 kthreads
[ 0.030616] CPU 56| <5> thread: created thread 1
[ 0.030624] CPU 130| <5> thread: created thread 2
[ 0.030620] CPU 132| <5> thread: created thread 3
192.168.10.3:5000
client config file (with directpath disabled)
host_addr 192.168.10.12
host_netmask 255.255.255.0
host_gateway 192.168.10.1
runtime_kthreads 6
runtime_spinning_kthreads 6
runtime_guaranteed_kthreads 6
runtime_priority lc
client output (with directpath disabled)
hameds@clnode258:~/fresh/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.3:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --mpps=4 --runtime=10
CPU 82| <5> cpu: detected 144 cores, 2 nodes
CPU 82| <5> time: detected 2394 ticks / us
[ 0.001521] CPU 82| <5> loading configuration from 'client.config'
[ 0.001551] CPU 82| <5> cfg: provisioned 6 cores (6 guaranteed, 0 burstable, 6 spinning)
[ 0.001554] CPU 82| <5> cfg: task is latency critical (LC)
[ 0.001556] CPU 82| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.001561] CPU 82| <5> cfg: storage disabled, directpath disabled
[ 0.001566] CPU 82| <5> process pid: 4299
[ 0.045363] CPU 82| <5> net: started network stack
[ 0.045375] CPU 82| <5> net: using the following configuration:
[ 0.045377] CPU 82| <5> addr: 192.168.10.12
[ 0.045382] CPU 82| <5> netmask: 255.255.255.0
[ 0.045386] CPU 82| <5> gateway: 192.168.10.1
[ 0.045389] CPU 82| <5> mac: 1E:F8:4D:96:92:3A
[ 0.045393] CPU 82| <5> mtu: 1500
[ 0.045481] CPU 82| <5> thread: created thread 0
[ 0.045519] CPU 82| <5> spawning 6 kthreads
[ 0.045625] CPU 88| <5> thread: created thread 1
[ 0.045649] CPU 14| <5> thread: created thread 2
[ 0.045677] CPU 16| <5> thread: created thread 3
[ 0.045708] CPU 12| <5> thread: created thread 4
[ 0.045812] CPU 90| <5> thread: created thread 5
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 199979, 199979, 0, 564, 14.0, 17.0, 19.0, 22.0, 26.0, 1663285645, 1917197244693
constant, 399790, 399790, 0, 2443, 14.0, 17.0, 20.0, 23.0, 29.0, 1663285669, 6647119303675
constant, 599096, 599096, 0, 10500, 15.0, 18.0, 22.0, 25.0, 31.0, 1663285694, 2503169428675
constant, 798748, 798748, 0, 11006, 16.0, 19.0, 23.0, 26.0, 32.0, 1663285719, 974912317878
constant, 997460, 997460, 0, 28556, 16.0, 20.0, 24.0, 27.0, 31.0, 1663285745, 2767705777870
constant, 1194705, 1194705, 0, 58748, 17.0, 21.0, 25.0, 28.0, 32.0, 1663285770, 2026650292323
constant, 1388157, 1388157, 0, 127499, 18.0, 23.0, 27.0, 31.0, 36.0, 1663285795, 2572397313447
constant, 1577879, 1577879, 0, 206694, 19.0, 25.0, 30.0, 35.0, 42.0, 1663285821, 4238550309606
constant, 1776928, 1776928, 0, 243370, 23.0, 30.0, 37.0, 44.0, 53.0, 1663285847, 1165788853293
constant, 1963880, 1963880, 0, 346013, 25.0, 34.0, 42.0, 50.0, 83.0, 1663285873, 650182423801
constant, 2074386, 2074354, 292, 1175100, 26.0, 37.0, 48.0, 551.0, 1914.0, 1663285899, 954547397049
constant, 2092420, 2092382, 347, 2822757, 28.0, 39.0, 67.0, 137.0, 1771.0, 1663285926, 311354389938
constant, 2169927, 2169927, 0, 3940534, 28.0, 42.0, 117.0, 233.0, 1595.0, 1663285953, 430182842875
client output (with directpath enabled in both the client and server config files)
hameds@clnode258:~/fresh/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.3:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --mpps=4 --runtime=10
CPU 82| <5> cpu: detected 144 cores, 2 nodes
CPU 82| <5> time: detected 2394 ticks / us
[ 0.001415] CPU 82| <5> loading configuration from 'client.config'
[ 0.001446] CPU 82| <5> cfg: provisioned 6 cores (6 guaranteed, 0 burstable, 6 spinning)
[ 0.001449] CPU 82| <5> cfg: task is latency critical (LC)
[ 0.001451] CPU 82| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.001455] CPU 82| <5> cfg: storage disabled, directpath enabled
[ 0.001462] CPU 82| <5> process pid: 4290
[ 0.046290] CPU 82| <5> net: started network stack
[ 0.046303] CPU 82| <5> net: using the following configuration:
[ 0.046306] CPU 82| <5> addr: 192.168.10.12
[ 0.046310] CPU 82| <5> netmask: 255.255.255.0
[ 0.046315] CPU 82| <5> gateway: 192.168.10.1
[ 0.046318] CPU 82| <5> mac: FA:3D:05:C7:2A:2E
[ 0.046322] CPU 82| <5> mtu: 1500
[ 0.357484] CPU 82| <5> thread: created thread 0
[ 0.357535] CPU 82| <5> spawning 6 kthreads
[ 0.357651] CPU 86| <5> thread: created thread 1
[ 0.357675] CPU 94| <5> thread: created thread 3
[ 0.357686] CPU 88| <5> thread: created thread 2
[ 0.357710] CPU 92| <5> thread: created thread 4
[ 0.357845] CPU 18| <5> thread: created thread 5
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 200000, 0, 0, 0, 1663285561
[ 24.931332] CPU 06| <5> init: shutting down -> SUCCESS
Glad that it's a least working without directpath, that's a good sign. Can you add directpath_pci <nic_pci_addr>
to your runtime config file? From your last iokerneld
invocation looks like it is "0000:ca:00.0", though it was different in some of the earlier ones...
Thanks, the NIC PCIe Address changed since I'm now using the r650 nodes from CloudLab (i was using the xl170 in some previous runs). I switched to the r650 nodes to run Caladan with directpath.
This is the server output after adding directpath_pci 0000:ca:00.0
to its runtime config file.
hameds@clnode262:~/fresh/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.3:5000 --config server.config --mode spawner-server
CPU 54| <5> cpu: detected 144 cores, 2 nodes
CPU 54| <5> time: detected 2394 ticks / us
[ 0.001378] CPU 54| <5> loading configuration from 'server.config'
[ 0.001408] CPU 54| <5> directpath: specified pci address 0000:ca:00.0
[ 0.001412] CPU 54| <5> cfg: provisioned 4 cores (4 guaranteed, 0 burstable, 0 spinning)
[ 0.001416] CPU 54| <5> cfg: task is latency critical (LC)
[ 0.001419] CPU 54| <5> cfg: THRESH_QD: 10, THRESH_HT: 0
[ 0.001422] CPU 54| <5> cfg: storage disabled, directpath enabled
[ 0.001428] CPU 54| <5> process pid: 65094
[ 0.031135] CPU 54| <5> net: started network stack
[ 0.031147] CPU 54| <5> net: using the following configuration:
[ 0.031149] CPU 54| <5> addr: 192.168.10.3
[ 0.031153] CPU 54| <5> netmask: 255.255.255.0
[ 0.031156] CPU 54| <5> gateway: 192.168.10.1
[ 0.031159] CPU 54| <5> mac: 46:FF:CE:FD:94:7B
[ 0.031163] CPU 54| <5> mtu: 1500
[ 0.305773] CPU 54| <5> thread: created thread 0
[ 0.305820] CPU 54| <5> spawning 4 kthreads
[ 0.305900] CPU 56| <5> thread: created thread 1
[ 0.305909] CPU 58| <5> thread: created thread 2
[ 0.305929] CPU 60| <5> thread: created thread 3
synthetic: ../providers/mlx5/dr_rule.c:1364: switch_qp_action: Assertion `old_qp_index == ((prev_qp->tir_icm_addr >> 5) & 0xffffffff)' failed.
Sorry for all the trouble! The CX6 NIC in the R650 actually uses a different format for flow steering rules than previous mlx5 NICs, and there's a patch for this too :) You can do git cherry-pick c6bdea2
. Alternatively you can just switch to our development branch (git checkout dev
) which seems to be fairly stable at the moment. Afterwards be sure to rebuild as before (make submodules-clean && make submodules && make clean && make
)...
No worries, Josh! I'm recompiling the dev branch, but now facing this issue with Cargo versions.
hameds@clnode280:~/caladan/apps/synthetic$ cargo update
Updating crates.io index
Updating git repository `https://github.com/fintelia/lockstep.git`
error: failed to get `shenango` as a dependency of package `synthetic v0.1.0 (/users/hameds/caladan/apps/synthetic)`
Caused by:
failed to load source for dependency `shenango`
Caused by:
Unable to update /users/hameds/caladan/bindings/rust
Caused by:
failed to parse manifest at `/users/hameds/caladan/bindings/rust/Cargo.toml`
Caused by:
failed to parse the `edition` key
Caused by:
this version of Cargo is older than the `2021` edition, and only supports `2015` and `2018` editions.
i tried downgrading Rust to nightly-2018-08-01
but then encountered this.
hameds@clnode280:~/caladan/apps/synthetic$ cargo update
Updating registry `https://github.com/rust-lang/crates.io-index`
Updating git repository `https://github.com/fintelia/lockstep.git`
error: failed to load source for a dependency on `shenango`
Caused by:
Unable to update file:///users/hameds/caladan/bindings/rust
Caused by:
failed to parse manifest at `/users/hameds/caladan/bindings/rust/Cargo.toml`
Caused by:
editions are unstable
Caused by:
feature `edition` is required
consider adding `cargo-features = ["edition"]` to the manifest
I then added cargo-features = "2018"
to the manifest file, but couldn't get passed this error.
Any suggestions on how to resolve this?
You'll need to switch to a newer version of rust, try running rustup default nightly
Thanks, i got passed that step after updating rust. However, I'm still having the same issue when running Caladan with directpath enabled. This is with the dev branch.
hameds@clnode280:~/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --mpps=4 --runtime=10
CPU 47| <5> cpu: detected 144 cores, 2 nodes
CPU 47| <5> time: detected 2394 ticks / us
[ 0.001352] CPU 47| <5> loading configuration from 'client.config'
[ 0.001382] CPU 47| <5> directpath: specified pci address 0000:ca:00.0
[ 0.001385] CPU 47| <5> cfg: provisioned 6 cores (6 guaranteed, 0 burstable, 6 spinning)
[ 0.001390] CPU 47| <5> cfg: task is latency critical (LC)
[ 0.001392] CPU 47| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.001394] CPU 47| <5> cfg: storage disabled, directpath enabled
[ 0.001397] CPU 47| <5> process pid: 68325
[ 0.045502] CPU 47| <5> net: started network stack
[ 0.045511] CPU 47| <5> net: using the following configuration:
[ 0.045513] CPU 47| <5> addr: 192.168.10.11
[ 0.045517] CPU 47| <5> netmask: 255.255.255.0
[ 0.045521] CPU 47| <5> gateway: 192.168.10.1
[ 0.045523] CPU 47| <5> mac: E2:BD:64:62:9A:1E
[ 0.045526] CPU 47| <5> mtu: 1500
[ 0.411158] CPU 07| <2> directpath_init: selected flow steering mode
[ 0.411254] CPU 07| <5> thread: created thread 0
[ 0.411300] CPU 07| <5> spawning 6 kthreads
[ 0.411387] CPU 11| <5> thread: created thread 1
[ 0.411419] CPU 13| <5> thread: created thread 2
[ 0.411471] CPU 98| <5> thread: created thread 3
[ 0.411584] CPU 19| <5> thread: created thread 4
[ 0.411708] CPU 89| <5> thread: created thread 5
synthetic: ../providers/mlx5/dr_rule.c:1361: switch_qp_action: Assertion `old_qp_index == ((prev_qp->tir_icm_addr >> 5) & 0xffffffff)' failed.
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
Aborted
I also tried applying the git cherry-pick c6bdea2
patch to the main branch but couldn't figure out how to update the rdma-core patch.
hameds@server:/proj/bluefield-2-PG0/cherry/caladan$ git cherry-pick c6bdea2
error: could not apply c6bdea2... directpath: update rdma-core patch
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
Ah, sorry. I will see if I can spin up an r650 instance and reproduce this problem. Will keep you posted.
Alright, thanks! If it makes things easier, I can add you to my project. I've got a few r650 nodes which I'm not using at the moment.
I couldn't reproduce it on my cloudlab instance, so perhaps I will need to remote into yours. BTW, did you run cargo build --release
after switching branches?
Yes, I did run that after switching branches.
Could you click on your username at the top right and then click "Start/Join Project". Then, select "Join Existing Project", type in Bluefield-2
and send me an invitation to join.
Thanks for the fix, Josh!
I've launched one server and two clients each with 16 guaranteed and spinning threads. I'm only reaching around 2 MRPS with the two clients loading the server. I've noticed the same throughput can be attained with a single client.
Is there a different way to run Caladan with multiple clients so I could reach a throughput close to 10 MRPS (similar to Fig. 4 in the paper)?
server:
hameds@server:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config server.config --mode spawner-server
CPU 12| <5> cpu: detected 144 cores, 2 nodes
CPU 12| <5> time: detected 2394 ticks / us
[ 0.001528] CPU 12| <5> loading configuration from 'server.config'
[ 0.001941] CPU 12| <5> directpath: specified pci address 0000:ca:00.0
[ 0.001945] CPU 12| <5> cfg: provisioned 16 cores (16 guaranteed, 0 burstable, 16 spinning)
[ 0.001949] CPU 12| <5> cfg: task is latency critical (LC)
[ 0.001952] CPU 12| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.001954] CPU 12| <5> cfg: storage disabled, directpath enabled
[ 0.001958] CPU 12| <5> process pid: 3900
[ 0.120889] CPU 12| <5> net: started network stack
[ 0.120899] CPU 12| <5> net: using the following configuration:
[ 0.120903] CPU 12| <5> addr: 192.168.10.10
[ 0.120907] CPU 12| <5> netmask: 255.255.255.0
[ 0.120910] CPU 12| <5> gateway: 192.168.10.1
[ 0.120911] CPU 12| <5> mac: BE:F3:21:FA:46:6D
[ 0.120915] CPU 12| <5> mtu: 1500
[ 0.481101] CPU 12| <2> directpath_init: selected flow steering mode
[ 0.481180] CPU 12| <5> thread: created thread 0
[ 0.481224] CPU 12| <5> spawning 16 kthreads
[ 0.481300] CPU 86| <5> thread: created thread 1
[ 0.481312] CPU 16| <5> thread: created thread 2
[ 0.481351] CPU 20| <5> thread: created thread 3
[ 0.481376] CPU 18| <5> thread: created thread 4
[ 0.481688] CPU 24| <5> thread: created thread 5
[ 0.481724] CPU 28| <5> thread: created thread 6
[ 0.481757] CPU 22| <5> thread: created thread 7
[ 0.481805] CPU 30| <5> thread: created thread 8
[ 0.481832] CPU 32| <5> thread: created thread 9
[ 0.481874] CPU 34| <5> thread: created thread 10
[ 0.481906] CPU 36| <5> thread: created thread 11
[ 0.481914] CPU 40| <5> thread: created thread 13
[ 0.481921] CPU 114| <5> thread: created thread 14
[ 0.481906] CPU 38| <5> thread: created thread 12
[ 0.481931] CPU 44| <5> thread: created thread 15
192.168.10.10:5000
client 1
hameds@clnode280:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --samples=10 --mpps=4
CPU 140| <5> cpu: detected 144 cores, 2 nodes
CPU 140| <5> time: detected 2394 ticks / us
[ 0.001545] CPU 140| <5> loading configuration from 'client.config'
[ 0.001930] CPU 140| <5> directpath: specified pci address 0000:ca:00.0
[ 0.001934] CPU 140| <5> cfg: provisioned 16 cores (16 guaranteed, 0 burstable, 16 spinning)
[ 0.001938] CPU 140| <5> cfg: task is latency critical (LC)
[ 0.001940] CPU 140| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.001942] CPU 140| <5> cfg: storage disabled, directpath enabled
[ 0.001945] CPU 140| <5> process pid: 4078
[ 0.119990] CPU 140| <5> net: started network stack
[ 0.120000] CPU 140| <5> net: using the following configuration:
[ 0.120002] CPU 140| <5> addr: 192.168.10.11
[ 0.120008] CPU 140| <5> netmask: 255.255.255.0
[ 0.120010] CPU 140| <5> gateway: 192.168.10.1
[ 0.120014] CPU 140| <5> mac: CE:6D:AB:70:6D:C1
[ 0.120017] CPU 140| <5> mtu: 1500
[ 0.479671] CPU 140| <2> directpath_init: selected flow steering mode
[ 0.479748] CPU 140| <5> thread: created thread 0
[ 0.479795] CPU 140| <5> spawning 16 kthreads
[ 0.479898] CPU 70| <5> thread: created thread 1
[ 0.479908] CPU 74| <5> thread: created thread 2
[ 0.479918] CPU 76| <5> thread: created thread 3
[ 0.479957] CPU 78| <5> thread: created thread 4
[ 0.480011] CPU 08| <5> thread: created thread 5
[ 0.480075] CPU 12| <5> thread: created thread 6
[ 0.480116] CPU 86| <5> thread: created thread 7
[ 0.480157] CPU 88| <5> thread: created thread 8
[ 0.480216] CPU 20| <5> thread: created thread 9
[ 0.480248] CPU 22| <5> thread: created thread 10
[ 0.480413] CPU 90| <5> thread: created thread 11
[ 0.480427] CPU 80| <5> thread: created thread 12
[ 0.480441] CPU 24| <5> thread: created thread 13
[ 0.480451] CPU 98| <5> thread: created thread 14
[ 0.480620] CPU 74| <5> thread: created thread 15
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 399466, 399466, 0, 1823, 15.0, 19.0, 25.0, 31.0, 37.0, 1663545714, 3373952818491
constant, 799303, 799303, 0, 7905, 19.0, 24.0, 30.0, 35.0, 39.0, 1663545738, 86805174399
constant, 1195790, 1022495, 1559855, 49540, 539.0, inf, inf, inf, inf, 1663545764, 8427829880217
constant, 1575468, 1131056, 3999353, 234732, 535.0, inf, inf, inf, inf, 1663545789, 2528231392404
constant, 1933598, 1243711, 6209215, 619952, 529.0, inf, inf, inf, inf, 1663545815, 2647695947254
constant, 2081979, 1269887, 7309264, 2919445, 533.0, inf, inf, inf, inf, 1663545841, 699453511367
constant, 2058493, 1269966, 7097138, 6716035, 532.0, inf, inf, inf, inf, 1663545867, 1948414866811
constant, 2041315, 1122954, 8265642, 10520430, 566.0, inf, inf, inf, inf, 1663545893, 373335251662
constant, 2053570, 1279991, 6961964, 14063250, 530.0, inf, inf, inf, inf, 1663545920, 3453792295107
constant, 2062638, 1224155, 7546421, 17498299, 546.0, inf, inf, inf, inf, 1663545947, 1739200032327
[260.861467] CPU 02| <5> init: shutting down -> SUCCESS
client 2
hameds@clnode262:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config client1.config --mode runtime-client --distribution=constant --mean=74 --samples=10 --mpps=4
CPU 30| <5> cpu: detected 144 cores, 2 nodes
CPU 30| <5> time: detected 2394 ticks / us
[ 0.001495] CPU 30| <5> loading configuration from 'client1.config'
[ 0.001836] CPU 30| <5> directpath: specified pci address 0000:ca:00.0
[ 0.001840] CPU 30| <5> cfg: provisioned 16 cores (16 guaranteed, 0 burstable, 16 spinning)
[ 0.001843] CPU 30| <5> cfg: task is latency critical (LC)
[ 0.001846] CPU 30| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.001847] CPU 30| <5> cfg: storage disabled, directpath enabled
[ 0.001850] CPU 30| <5> process pid: 5575
[ 0.115735] CPU 30| <5> net: started network stack
[ 0.115746] CPU 30| <5> net: using the following configuration:
[ 0.115750] CPU 30| <5> addr: 192.168.10.13
[ 0.115753] CPU 30| <5> netmask: 255.255.255.0
[ 0.115756] CPU 30| <5> gateway: 192.168.10.1
[ 0.115757] CPU 30| <5> mac: BE:C4:19:33:F5:66
[ 0.115760] CPU 30| <5> mtu: 1500
[ 0.468518] CPU 30| <2> directpath_init: selected flow steering mode
[ 0.468599] CPU 30| <5> thread: created thread 0
[ 0.468641] CPU 30| <5> spawning 16 kthreads
[ 0.468730] CPU 104| <5> thread: created thread 1
[ 0.468743] CPU 34| <5> thread: created thread 2
[ 0.468817] CPU 38| <5> thread: created thread 4
[ 0.468830] CPU 36| <5> thread: created thread 3
[ 0.468912] CPU 104| <5> thread: created thread 5
[ 0.468985] CPU 42| <5> thread: created thread 6
[ 0.469215] CPU 46| <5> thread: created thread 7
[ 0.469242] CPU 122| <5> thread: created thread 8
[ 0.469295] CPU 50| <5> thread: created thread 9
[ 0.469335] CPU 54| <5> thread: created thread 10
[ 0.469405] CPU 56| <5> thread: created thread 11
[ 0.469429] CPU 58| <5> thread: created thread 12
[ 0.469480] CPU 60| <5> thread: created thread 13
[ 0.469508] CPU 62| <5> thread: created thread 14
[ 0.469557] CPU 136| <5> thread: created thread 15
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 399403, 399403, 0, 2548, 15.0, 20.0, 26.0, 32.0, 38.0, 1663545711, 375468018700
constant, 799210, 799210, 0, 9422, 20.0, 25.0, 30.0, 35.0, 40.0, 1663545736, 483573247697
constant, 1194296, 977244, 1953303, 60998, 546.0, inf, inf, inf, inf, 1663545761, 1660974959896
constant, 1579361, 993210, 5275748, 189525, 547.0, inf, inf, inf, inf, 1663545787, 1941138587065
constant, 1875913, 991595, 7958356, 1137963, 545.0, inf, inf, inf, inf, 1663545812, 849080189139
constant, 1954281, 963394, 8917749, 4075185, inf, inf, inf, inf, inf, 1663545838, 745628917047
constant, 1799518, 922294, 7894892, 9050973, 551.0, inf, inf, inf, inf, 1663545865, 2335032003206
constant, 2047189, 945048, 9919026, 10466439, inf, inf, inf, inf, inf, 1663545891, 5881773452637
constant, 1881052, 941035, 8460145, 15631274, 563.0, inf, inf, inf, inf, 1663545918, 4159673334186
constant, 2009738, 965991, 9392590, 17987246, inf, inf, inf, inf, inf, 1663545945, 1616553518557
[260.970440] CPU 16| <5> init: shutting down -> SUCCESS
Glad to hear things are working better. The first thing you can try is to specify a greater number of client (user) threads with the --threads
parameter (perhaps --threads 100
as a starting point). Each thread will use its own connection to send requests and you'll be able to achieve much better concurrency on the client side and server side.
Thanks Josh. A single client is now able to generate around 4 MRPS of load. However, for higher loads, the server seems to run out of buffers? Is there a fix for this?
server
hameds@server:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config server.config --mode spawner-server --threads 24
CPU 12| <5> cpu: detected 144 cores, 2 nodes
CPU 12| <5> time: detected 2394 ticks / us
[ 0.001566] CPU 12| <5> loading configuration from 'server.config'
[ 0.002420] CPU 12| <5> directpath: specified pci address 0000:ca:00.0
[ 0.002424] CPU 12| <5> cfg: provisioned 32 cores (32 guaranteed, 0 burstable, 32 spinning)
[ 0.002430] CPU 12| <5> cfg: task is latency critical (LC)
[ 0.002433] CPU 12| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.002436] CPU 12| <5> cfg: storage disabled, directpath enabled
[ 0.002440] CPU 12| <5> process pid: 7705
[ 0.232562] CPU 12| <5> net: started network stack
[ 0.232573] CPU 12| <5> net: using the following configuration:
[ 0.232577] CPU 12| <5> addr: 192.168.10.10
[ 0.232581] CPU 12| <5> netmask: 255.255.255.0
[ 0.232583] CPU 12| <5> gateway: 192.168.10.1
[ 0.232585] CPU 12| <5> mac: D6:C8:5A:6A:21:A1
[ 0.232587] CPU 12| <5> mtu: 1500
[ 0.740671] CPU 12| <2> directpath_init: selected flow steering mode
[ 0.740754] CPU 12| <5> thread: created thread 0
[ 0.740798] CPU 12| <5> spawning 32 kthreads
[ 0.740884] CPU 86| <5> thread: created thread 1
[ 0.740891] CPU 88| <5> thread: created thread 2
[ 0.740943] CPU 92| <5> thread: created thread 3
[ 0.741083] CPU 94| <5> thread: created thread 4
[ 0.741110] CPU 96| <5> thread: created thread 5
[ 0.741146] CPU 98| <5> thread: created thread 6
[ 0.741190] CPU 90| <5> thread: created thread 7
[ 0.741237] CPU 102| <5> thread: created thread 8
[ 0.741404] CPU 104| <5> thread: created thread 9
[ 0.741431] CPU 36| <5> thread: created thread 10
[ 0.741464] CPU 40| <5> thread: created thread 11
[ 0.741486] CPU 114| <5> thread: created thread 12
[ 0.741657] CPU 32| <5> thread: created thread 13
[ 0.741755] CPU 40| <5> thread: created thread 14
[ 0.741783] CPU 116| <5> thread: created thread 15
[ 0.741874] CPU 118| <5> thread: created thread 16
[ 0.741907] CPU 120| <5> thread: created thread 17
[ 0.741980] CPU 112| <5> thread: created thread 18
[ 0.742006] CPU 122| <5> thread: created thread 19
[ 0.742093] CPU 124| <5> thread: created thread 20
[ 0.742122] CPU 126| <5> thread: created thread 21
[ 0.742142] CPU 128| <5> thread: created thread 22
[ 0.742258] CPU 130| <5> thread: created thread 23
[ 0.742290] CPU 132| <5> thread: created thread 24
[ 0.742321] CPU 134| <5> thread: created thread 25
[ 0.742344] CPU 136| <5> thread: created thread 26
[ 0.742370] CPU 138| <5> thread: created thread 27
[ 0.742390] CPU 140| <5> thread: created thread 28
[ 0.742396] CPU 74| <5> thread: created thread 29
[ 0.742611] CPU 76| <5> thread: created thread 30
[ 0.742712] CPU 78| <5> thread: created thread 31
192.168.10.10:5000
[159.500644] CPU 18| <3> txq full
[159.500644] CPU 20| <3> txq full
[159.993550] CPU 94| <3> net: out of tx buffers
[159.993571] CPU 28| <3> net: out of tx buffers
[159.993558] CPU 32| <3> net: out of tx buffers
[159.993564] CPU 88| <3> net: out of tx buffers
[159.993567] CPU 98| <3> net: out of tx buffers
[159.993560] CPU 14| <3> net: out of tx buffers
[160.993562] CPU 20| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[160.993569] CPU 76| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[160.993597] CPU 76| <3> net: out of tx buffers
[160.993567] CPU 08| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[160.993612] CPU 08| <3> net: out of tx buffers
[160.993564] CPU 104| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[160.993626] CPU 104| <3> net: out of tx buffers
[160.993566] CPU 32| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[160.993638] CPU 32| <3> net: out of tx buffers
[160.993567] CPU 80| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013127] CPU 80| <3> net: out of tx buffers
[160.993589] CPU 20| <3> net: out of tx buffers
[160.993563] CPU 28| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013367] CPU 28| <3> net: out of tx buffers
[160.993565] CPU 14| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013452] CPU 14| <3> net: out of tx buffers
[160.993567] CPU 74| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013548] CPU 74| <3> net: out of tx buffers
[160.993568] CPU 26| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013711] CPU 26| <3> net: out of tx buffers
[160.993563] CPU 12| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013797] CPU 12| <3> net: out of tx buffers
[160.993565] CPU 30| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013892] CPU 30| <3> net: out of tx buffers
[160.993565] CPU 92| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.013906] CPU 92| <3> net: out of tx buffers
[160.993568] CPU 24| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.014031] CPU 24| <3> net: out of tx buffers
[160.993569] CPU 18| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.014467] CPU 18| <3> net: out of tx buffers
[160.993567] CPU 10| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.014599] CPU 10| <3> net: out of tx buffers
[160.993562] CPU 22| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.015641] CPU 22| <3> net: out of tx buffers
[160.993564] CPU 88| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.019139] CPU 88| <3> net: out of tx buffers
[160.993565] CPU 102| <3> runtime/net/core.c:308 net_tx_alloc_mbuf() suppressed 2124806 times
[161.019616] CPU 102| <3> net: out of tx buffers
client
hameds@clnode280:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --samples=10 --mpps=10 --threads 40
CPU 70| <5> cpu: detected 144 cores, 2 nodes
CPU 70| <5> time: detected 2394 ticks / us
[ 0.001586] CPU 70| <5> loading configuration from 'client.config'
[ 0.002060] CPU 70| <5> directpath: specified pci address 0000:ca:00.0
[ 0.002064] CPU 70| <5> cfg: provisioned 40 cores (40 guaranteed, 0 burstable, 40 spinning)
[ 0.002069] CPU 70| <5> cfg: task is latency critical (LC)
[ 0.002072] CPU 70| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.002075] CPU 70| <5> cfg: storage disabled, directpath enabled
[ 0.002081] CPU 70| <5> process pid: 8704
[ 0.301332] CPU 70| <5> net: started network stack
[ 0.301343] CPU 70| <5> net: using the following configuration:
[ 0.301347] CPU 70| <5> addr: 192.168.10.11
[ 0.301351] CPU 70| <5> netmask: 255.255.255.0
[ 0.301353] CPU 70| <5> gateway: 192.168.10.1
[ 0.301358] CPU 70| <5> mac: AA:9F:BF:DA:55:03
[ 0.301362] CPU 70| <5> mtu: 1500
[ 0.900981] CPU 70| <2> directpath_init: selected flow steering mode
[ 0.901069] CPU 70| <5> thread: created thread 0
[ 0.901113] CPU 70| <5> spawning 40 kthreads
[ 0.901211] CPU 06| <5> thread: created thread 1
[ 0.901237] CPU 76| <5> thread: created thread 2
[ 0.901252] CPU 08| <5> thread: created thread 3
[ 0.901284] CPU 10| <5> thread: created thread 4
[ 0.901343] CPU 12| <5> thread: created thread 5
[ 0.901504] CPU 80| <5> thread: created thread 6
[ 0.901528] CPU 16| <5> thread: created thread 7
[ 0.901557] CPU 18| <5> thread: created thread 8
[ 0.901600] CPU 20| <5> thread: created thread 9
[ 0.901655] CPU 96| <5> thread: created thread 10
[ 0.901698] CPU 28| <5> thread: created thread 11
[ 0.901738] CPU 30| <5> thread: created thread 12
[ 0.901936] CPU 32| <5> thread: created thread 13
[ 0.901963] CPU 34| <5> thread: created thread 14
[ 0.902000] CPU 36| <5> thread: created thread 15
[ 0.902044] CPU 38| <5> thread: created thread 16
[ 0.902102] CPU 42| <5> thread: created thread 17
[ 0.902166] CPU 116| <5> thread: created thread 18
[ 0.902207] CPU 48| <5> thread: created thread 19
[ 0.902256] CPU 52| <5> thread: created thread 20
[ 0.902301] CPU 54| <5> thread: created thread 21
[ 0.902513] CPU 58| <5> thread: created thread 22
[ 0.902528] CPU 88| <5> thread: created thread 23
[ 0.902551] CPU 90| <5> thread: created thread 24
[ 0.902576] CPU 106| <5> thread: created thread 25
[ 0.902618] CPU 110| <5> thread: created thread 26
[ 0.902756] CPU 116| <5> thread: created thread 27
[ 0.902781] CPU 118| <5> thread: created thread 28
[ 0.902825] CPU 62| <5> thread: created thread 29
[ 0.902876] CPU 132| <5> thread: created thread 30
[ 0.902935] CPU 64| <5> thread: created thread 31
[ 0.903061] CPU 66| <5> thread: created thread 32
[ 0.903104] CPU 68| <5> thread: created thread 33
[ 0.903141] CPU 04| <5> thread: created thread 34
[ 0.903286] CPU 138| <5> thread: created thread 35
[ 0.903310] CPU 74| <5> thread: created thread 36
[ 0.903341] CPU 140| <5> thread: created thread 37
[ 0.903369] CPU 66| <5> thread: created thread 38
[ 0.903408] CPU 142| <5> thread: created thread 39
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9th, 99.99th, Start
constant, 931177, 931177, 0, 605824, 16.0, 19.0, 22.0, 25.0, 34.0, 1663609445, 1043295877929
constant, 1868260, 1868260, 0, 1169696, 15.0, 18.0, 21.0, 23.0, 26.0, 1663609470, 222802545815
constant, 2795398, 2795398, 0, 1824078, 14.0, 17.0, 20.0, 22.0, 147.0, 1663609495, 478776857971
constant, 3721884, 3721884, 0, 2494591, 14.0, 18.0, 21.0, 27.0, 964.0, 1663609519, 281187033052
constant, 5000000, 0, 0, 0, 1663609543
[124.652870] CPU 82| <5> init: shutting down -> SUCCESS
Also if my client spawns 50 threads, it seems to cause a memory allocation problem. Does this indicate the load generation limits of the client?
hameds@clnode280:/proj/bluefield-2-PG0/caladan$ sudo ./apps/synthetic/target/release/synthetic 192.168.10.10:5000 --config client.config --mode runtime-client --distribution=constant --mean=74 --samples=10 --mpps=10 --threads 50
CPU 142| <5> cpu: detected 144 cores, 2 nodes
CPU 142| <5> time: detected 2394 ticks / us
[ 0.001582] CPU 142| <5> loading configuration from 'client.config'
[ 0.002123] CPU 142| <5> directpath: specified pci address 0000:ca:00.0
[ 0.002128] CPU 142| <5> cfg: provisioned 50 cores (50 guaranteed, 0 burstable, 50 spinning)
[ 0.002133] CPU 142| <5> cfg: task is latency critical (LC)
[ 0.002137] CPU 142| <5> cfg: THRESH_QD: 10, THRESH_HT: 0 THRESH_QUANTUM: 100
[ 0.002140] CPU 142| <5> cfg: storage disabled, directpath enabled
[ 0.002144] CPU 142| <5> process pid: 8793
[ 0.376234] CPU 142| <5> net: started network stack
[ 0.376245] CPU 142| <5> net: using the following configuration:
[ 0.376249] CPU 142| <5> addr: 192.168.10.11
[ 0.376252] CPU 142| <5> netmask: 255.255.255.0
[ 0.376254] CPU 142| <5> gateway: 192.168.10.1
[ 0.376257] CPU 142| <5> mac: B6:30:F8:A1:0E:30
[ 0.376261] CPU 142| <5> mtu: 1500
[ 0.876221] CPU 02| <0> FATAL: runtime/ioqueues.c:149 ASSERTION '!p' FAILED IN 'iok_shm_alloc'
./apps/synthetic/target/release/synthetic(+0x92fdb)[0x56405353efdb]
./apps/synthetic/target/release/synthetic(+0x93044)[0x56405353f044]
./apps/synthetic/target/release/synthetic(+0x415d4)[0x5640534ed5d4]
./apps/synthetic/target/release/synthetic(+0x98364)[0x564053544364]
./apps/synthetic/target/release/synthetic(+0x61ba2)[0x56405350dba2]
./apps/synthetic/target/release/synthetic(+0x53b3b)[0x5640534ffb3b]
./apps/synthetic/target/release/synthetic(+0x54964)[0x564053500964]
./apps/synthetic/target/release/synthetic(+0x52f08)[0x5640534fef08]
./apps/synthetic/target/release/synthetic(+0x51f75)[0x5640534fdf75]
./apps/synthetic/target/release/synthetic(+0x40f83)[0x5640534ecf83]
./apps/synthetic/target/release/synthetic(+0x378d9)[0x5640534e38d9]
./apps/synthetic/target/release/synthetic(+0x2a3fd)[0x5640534d63fd]
./apps/synthetic/target/release/synthetic(+0x23b90)[0x5640534cfb90]
./apps/synthetic/target/release/synthetic(+0x25ca3)[0x5640534d1ca3]
./apps/synthetic/target/release/synthetic(+0x26f09)[0x5640534d2f09]
[ 0.876376] CPU 02| <5> init: shutting down -> FAILURE
Looks like the NIC is on NUMA node 1 while the iokernel schedules/uses node 0 by default. I added a command line flag for the iokernel to select the other node. You should pull again from dev (only need to build with make
afterwards), and then launch the iokernel with sudo ./iokerneld ias nicpci 0000:ca:00.0 numanode 1
.
A note about the client --threads
parameter - you can have many more user threads/connections that actual runtime kthreads. So you can probably stick to 16-20 kthreads and still pass a parameter of --threads 400
to create many connections.
Ah, thanks! Makes sense.
For a constant (fixed) service time distribution, does a mean value of 74 still correspond to a 1us service time on the r650 nodes? I ran an experiment with 16 server threads and noticed throughput values around 16.3 MRPS, while the max theoretical throughput should be 16 MRPS. Do I have to change the mean value to get a constant 1us distribution.
I see something similar when Turbo Boost is enabled. You can see the status by running cat /sys/devices/system/cpu/intel_pstate/no_turbo
. To disable it , you can run echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
.
On these machines, it seems like a value of 120 best approximates 1us.
Thanks Josh! I'm now trying to compile Caladan (dev branch) with directpath enabled on a local server (not cloudlab) with ConnectX-6 NICs.
This new server has the same OS and kernel and OFED versions, and packages as the working cloudlab setup.
I'm seeing these linking errors at the sudo make
step after building the submodules.
Any idea what might be the problem?
/shared/hseyedro3/cloudlab/dev/caladan/dpdk/build/lib/x86_64-linux-gnu/librte_common_mlx5.a(common_mlx5_linux_mlx5_glue.c.o): In function `mlx5_glue_devx_port_query':
mlx5_glue.c:(.text+0x52b): undefined reference to `mlx5dv_query_devx_port'
/shared/hseyedro3/cloudlab/dev/caladan/dpdk/build/lib/x86_64-linux-gnu/librte_common_mlx5.a(common_mlx5_linux_mlx5_glue.c.o): In function `mlx5_glue_dr_create_flow_action_pop_vlan':
mlx5_glue.c:(.text+0x881): undefined reference to `mlx5dv_dr_action_create_pop_vlan'
/shared/hseyedro3/cloudlab/dev/caladan/dpdk/build/lib/x86_64-linux-gnu/librte_common_mlx5.a(common_mlx5_linux_mlx5_glue.c.o): In function `mlx5_glue_dr_create_flow_action_push_vlan':
mlx5_glue.c:(.text+0x891): undefined reference to `mlx5dv_dr_action_create_push_vlan'
/shared/hseyedro3/cloudlab/dev/caladan/dpdk/build/lib/x86_64-linux-gnu/librte_common_mlx5.a(common_mlx5_linux_mlx5_glue.c.o): In function `mlx5_glue_dr_create_flow_action_dest_port':
mlx5_glue.c:(.text+0x8b1): undefined reference to `mlx5dv_dr_action_create_dest_ib_port'
collect2: error: ld returned 1 exit status
Makefile:62: recipe for target 'iokerneld' failed
make: *** [iokerneld] Error 1
Can you run make submodules-clean; make submodules
and attach the output in a file so I can take a look?
Sure, please find it below:
Any chance you can reclone the repo, and then run all the make steps without sudo? It shouldn't be necessary until launching the iokernel/directpath runtimes.
Ok. Here it is: caladan_output.txt
What is the output of lspci | grep 'ConnectX-[4,5,6]'
on your machine? Can you add set -x
to build/init_submodules.sh
, and run make submodules
and share the output?
Hmm, this is odd. The CX-6 NICs are found only when I use sudo
and -vv
.
Do you think the script is not able to find the CX-6 NIC?
hseyedro3@keg2:$ sudo lspci | grep 'ConnectX-[4,5,6]'
hseyedro3@keg2:$ lspci | grep 'ConnectX-[4,5,6]'
hseyedro3@keg2:$ lspci -vv | grep 'ConnectX-[4,5,6]'
hseyedro3@keg2:$ sudo lspci -vv | grep 'ConnectX-[4,5,6]'
Product Name: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56, PCIe 4.0 x16, No Crypto
Product Name: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56, PCIe 4.0 x16, No Crypto
Could you share the whole output of sudo lspci -vv
?
OK, perhaps this has to do with the age of the distro/kernel.
Can you change the line in build/init_submodules.sh
from lspci | grep -q 'ConnectX-[4,5,6]
to lspci | grep -q MT28841
? The rerun make submodules-clean && make submodules && make
On Wed, Sep 28, 2022 at 3:17 PM Hamed Seyedroudbari < @.***> wrote:
sudolspci-vv_output.txt https://github.com/shenango/caladan/files/9668299/sudo_lspci_-vv_output.txt
— Reply to this email directly, view it on GitHub https://github.com/shenango/caladan/issues/12#issuecomment-1261361130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG4PSFESRKBOMTWHFIADP3WASKVPANCNFSM57L6LL3A . You are receiving this because you commented.Message ID: @.***>
-- Josh Fried @.***
Thanks Josh! That did the trick!
Josh, thanks a ton for your help! I will close this issue for now.
All the best to you in your research! -Hamed
Hi,
Thank you for your clear instructions in compiling Caladan! I'm trying to compile and run Caladan on the XL170 machines in CloudLab (running Ubuntu 18.04.6 LTS, kernel 4.15.0-191-generic) . All instructions worked out fine until the very last step of building the synthetic client-server application.
Specifically,
cargo build --release
fails to compile Shenango. Any idea on how to resolve this? Below is the outputI'd greatly appreciate your help!
Best regards, Hamed