Open matin-karimpour opened 3 months ago
Ho @matin-karimpour have you tried increasing the batch size? Typically you wouldn't use the ANO for a batch size of 1 since the efficiency is very bad. Can you please paste your whole config?
Hi @cliffburdick, When my batch size is 1000 I haven't any errors but it can't communicate with RX side. As I said before when I send packets to the IP of the RX side it can see them and in the result, counts them. sure here are my config files TX:
multithreaded: true
num_delay_ops: 32
delay: 0.1
delay_step: 0.01
scheduler:
check_recession_period_ms: 0
worker_thread_number: 5
stop_on_deadlock: true
stop_on_deadlock_timeout: 500
advanced_network:
cfg:
version: 1
manager: "dpdk"
master_core: 3
debug: false
memory_regions:
- name: "Data_TX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Default_RX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Data_RX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Default_RX_CPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
interfaces:
- name: data1
address: 07:00.1
tx:
- queues:
- name: "ADC Samples"
id: 0
batch_size: 10240
split_boundary: 0
cpu_core: 11
memory_regions:
- "Data_TX_GPU"
offloads:
- "tx_eth_src"
# - name: data2
# address: 0005:03:00.1
# rx:
# - queues:
# - name: "Default"
# id: 0
# cpu_core: 10
# batch_size: 10240
# output_port: "bench_rx_out"
# memory_regions:
# - "Default_RX_GPU"
# - name: "Data"
# id: 1
# cpu_core: 9
# batch_size: 10240
# output_port: "bench_rx_out"
# memory_regions:
# - "Data_RX_GPU"
# flows:
# - name: "ADC Samples"
# action:
# type: queue
# id: 1
# match:
# udp_src: 4096 #12288
# udp_dst: 4096 #12288
bench_rx:
split_boundary: false
gpu_direct: true
batch_size: 10240
max_packet_size: 1064
header_size: 64
bench_tx:
eth_dst_addr: a0:88:c2:b4:89:97 # Destination MAC
udp_dst_port: 4096 # UDP destination port
udp_src_port: 4096 # UDP source port
gpu_direct: true
split_boundary: 0
batch_size: 100
payload_size: 1000
header_size: 64
ip_src_addr: 192.168.112.30 # Source IP send from
ip_dst_addr: 192.168.112.74 # Destination IP to send to
address: 0000:07:00.1
RX:
multithreaded: true
num_delay_ops: 32
delay: 0.1
delay_step: 0.01
scheduler:
check_recession_period_ms: 0
worker_thread_number: 10
stop_on_deadlock: true
stop_on_deadlock_timeout: 500
advanced_network:
cfg:
version: 1
manager: "dpdk"
master_core: 3
debug: false
memory_regions:
- name: "Data_TX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Default_RX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Data_RX_GPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
- name: "Default_RX_CPU"
kind: "device"
affinity: 0
access:
- local
num_bufs: 51200
buf_size: 9000
interfaces:
# - name: data1
# address: 07:00.1
# tx:
# - queues:
# - name: "ADC Samples"
# id: 0
# batch_size: 10240
# split_boundary: 0
# cpu_core: 11
# memory_regions:
# - "Data_TX_GPU"
# offloads:
# - "tx_eth_src"
- name: data2
address: 07:00.1
rx:
- queues:
- name: "Default"
id: 0
cpu_core: 10
batch_size: 10240
output_port: "bench_rx_out"
memory_regions:
- "Default_RX_GPU"
- name: "Data"
id: 1
cpu_core: 9
batch_size: 10240
output_port: "bench_rx_out"
memory_regions:
- "Data_RX_GPU"
flows:
- name: "ADC Samples"
action:
type: queue
id: 0
match:
udp_src: 4096 #12288
udp_dst: 4096 #12288
bench_rx:
split_boundary: false
gpu_direct: false
batch_size: 10240
max_packet_size: 1064
header_size: 64
bench_tx:
eth_dst_addr: a0:88:c2:b4:8a:1f # Destination MAC
udp_dst_port: 4096 # UDP destination port
udp_src_port: 4096 # UDP source port
gpu_direct: false
split_boundary: 0
batch_size: 1
payload_size: 1000
header_size: 64
ip_src_addr: 192.168.112.74 # Source IP send from
ip_dst_addr: 192.168.112.30 # Destination IP to send to
address: 07:00.1
Hi @matin-karimpour, can you describe the problem in detail of what you're trying to do? My understanding is you have two separate nodes where one is TX and one is RX. If you use a batch size of 1000 do you see the ethtool counters increasing on the TX side, but don't see it on the RX side? If you use 1 can you describe what is different on ethtool -S?
I would like to collect data from sensors for fault detection, perform some preprocessing on the edge device, and then send the data to the main server for final processing. For this purpose, I need to process and transmit the data quickly, so I plan to use GPU direct for real-time transmission.
As a first step in working with GPU direct, I'm trying to run the Advanced Networking Benchmark (ANO) application without modifying the code. To do this, I need to establish a connection between a TX and a RX on two different servers. However, the RX is unable to receive data from the TX. I want to emphasize that I'm running the application with the default code and have only made the necessary hardware configuration changes. To see what's going on on the TX side I changed the batch size to 1 and got the above error.
To test the RX, I sent a few packets using the ping
command, and the RX successfully received them. My question is, why is the RX able to receive data sent via the ping
command and a Python script I wrote (which sends data without GPU direct), but unable to detect data from the sender in this application? I hope my explanation has been clear.
Another question that has come up for me is, at what stage of development is TCP for ANO?
Another thing that I realized is that the RX side after a short period doesn't work anymore. it stopped when printing the log below.
[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 5]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 6]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 8]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 9]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 7]
As shown in the traffic network graph, there is some initial traffic on the NIC, and the RX can successfully count this data. However, after a while, it stops receiving data and can no longer receive any, even though the TX side (python script) is consistently sending data. Could you explain why this is happening?
Hi @matin-karimpour I've been out on vacation, so I apologize for the brief responses. The ANO sample app should work correctly on your system by only modifying the MAC/PCI address on both sides. You may have to modify the buffer size as well if your GPU has a small amount of memory, but that's not common and you should get a startup error if that's the case.
Can you paste both the TX and RX startup output? I want to make sure everything looks normal. Also, any details about your system (GPUs and NICs used), as well as how they're connected. Have you looked at ethtool -S
on both sides to see what's leaving/coming?
Hi @cliffburdick , no problem at all! I hope you had a great vacation. Thanks for getting back to me. I using A4000 and Mellanox CX6 on both sides. My issue is not due to hardware configuration. when running the TX side ethtool -S
doesn't change any stats values on both sides.
[info] [main.cpp:32] Initializing advanced network operator
[info] [main.cpp:35] Using ANO manager dpdk
[info] [gxf_executor.cpp:247] Creating context
[info] [gxf_executor.cpp:1672] Loading extensions from configs...
[info] [adv_network_rx.cpp:40] AdvNetworkOpRx::initialize()
[info] [adv_network_common.h:544] Finished reading advanced network operator config
[info] [adv_network_mgr.cpp:35] Selecting DPDK as ANO manager
[info] [adv_network_dpdk_mgr.cpp:287] Attempting to use 1 ports for high-speed network
[info] [adv_network_dpdk_mgr.cpp:312] DPDK EAL arguments: adv_net_operator --file-prefix=nwlrbbmqbh -l 3,10,9 -a 07:00.1,txq_inline_max=0,dv_flow_en=1
EAL: Detected CPU lcores: 24
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/nwlrbbmqbh/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:07:00.1 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7fdb4a000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7fdb2c000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7fdb0e000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7fdaf0000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb4a000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb2c000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb0e000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdaf0000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7fdad2000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7fdab4000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7fda96000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7fda78000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdad2000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdab4000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fda96000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fda78000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:378] DPDK init (07:00.1) -- RX: ENABLED TX: DISABLED
[info] [adv_network_dpdk_mgr.cpp:388] Configuring RX queue: Default (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:427] Created mempool RXP_P0_Q0_MR0 : mbufs=51200 elsize=9768 ptr=0x17f304280
[info] [adv_network_dpdk_mgr.cpp:388] Configuring RX queue: Data (1) on port 0
[info] [adv_network_dpdk_mgr.cpp:427] Created mempool RXP_P0_Q1_MR0 : mbufs=51200 elsize=9768 ptr=0x17f309dc0
[info] [adv_network_dpdk_mgr.cpp:477] Setting port config for port 0 mtu:9384
[info] [adv_network_dpdk_mgr.cpp:567] Initializing port 0 with 2 RX queues and 0 TX queues...
[info] [adv_network_dpdk_mgr.cpp:583] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:593] Successfully set descriptors
[info] [adv_network_dpdk_mgr.cpp:610] Setting up port:0, queue:0, Num scatter:1 pool:0x17f304280
[info] [adv_network_dpdk_mgr.cpp:631] Successfully setup RX port 0 queue 0
[info] [adv_network_dpdk_mgr.cpp:610] Setting up port:0, queue:1, Num scatter:1 pool:0x17f309dc0
[info] [adv_network_dpdk_mgr.cpp:631] Successfully setup RX port 0 queue 1
[info] [adv_network_dpdk_mgr.cpp:662] Successfully started port 0
[info] [adv_network_dpdk_mgr.cpp:665] Port 0, MAC address: A0:88:C2:B4:89:97
[info] [adv_network_dpdk_mgr.cpp:677] Enabling promiscuous mode for port 0
[info] [adv_network_dpdk_mgr.cpp:687] Adding RX flow ADC Samples
[info] [adv_network_dpdk_mgr.cpp:710] Setting up RX burst pool with 8191 batches
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_TX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section
[info] [adv_network_dpdk_mgr.cpp:1052] Config validated successfully
[info] [adv_network_dpdk_mgr.cpp:1065] Starting advanced network workers
[info] [adv_network_dpdk_mgr.cpp:1116] Flushing packet on port 0
[info] [adv_network_dpdk_mgr.cpp:1137] Starting RX Core 10, port 0, queue 0, socket 0
[info] [adv_network_dpdk_mgr.cpp:1106] Done starting workers
[info] [adv_network_dpdk_mgr.cpp:1116] Flushing packet on port 0
[info] [dpdk_bench_op_rx.h:44] AdvNetworkingBenchDefaultRxOp::initialize()
[info] [adv_network_dpdk_mgr.cpp:1137] Starting RX Core 9, port 0, queue 1, socket 0
[info] [dpdk_bench_op_rx.h:67] AdvNetworkingBenchDefaultRxOp::initialize() complete
[info] [gxf_executor.cpp:1842] Activating Graph...
[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 5]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 6]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 7]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 8]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 9]
[info] [main.cpp:32] Initializing advanced network operator
[info] [main.cpp:35] Using ANO manager dpdk
[info] [gxf_executor.cpp:247] Creating context
[info] [gxf_executor.cpp:1672] Loading extensions from configs...
[info] [dpdk_bench_op_tx.h:80] AdvNetworkingBenchDefaultTxOp::initialize()
[info] [dpdk_bench_op_tx.h:114] Initialized 4 streams and events
[info] [dpdk_bench_op_tx.h:131] AdvNetworkingBenchDefaultTxOp::initialize() complete
[info] [adv_network_tx.cpp:43] AdvNetworkOpTx::initialize()
[info] [adv_network_common.h:544] Finished reading advanced network operator config
[info] [adv_network_mgr.cpp:35] Selecting DPDK as ANO manager
[info] [adv_network_dpdk_mgr.cpp:287] Attempting to use 1 ports for high-speed network
[info] [adv_network_dpdk_mgr.cpp:312] DPDK EAL arguments: adv_net_operator --file-prefix=nwlrbbmqbh -l 3,11 -a 07:00.1,txq_inline_max=0,dv_flow_en=1
EAL: Detected CPU lcores: 32
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/nwlrbbmqbh/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:07:00.1 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7f68cc000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7f60a2000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7f6084000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7f6066000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f68cc000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f60a2000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6084000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6066000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7f6048000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7f602a000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7f600c000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7f5fee000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6048000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f602a000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f600c000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f5fee000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:378] DPDK init (07:00.1) -- RX: DISABLED TX: ENABLED
[info] [adv_network_dpdk_mgr.cpp:477] Setting port config for port 0 mtu:64
[info] [adv_network_dpdk_mgr.cpp:486] Configuring TX queue: ADC Samples (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:524] Created mempool TXP_P0_Q0_MR0 : mbufs=51200 elsize=9768 ptr=0x7f60ff304380
[info] [adv_network_dpdk_mgr.cpp:567] Initializing port 0 with 0 RX queues and 1 TX queues...
[info] [adv_network_dpdk_mgr.cpp:583] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:593] Successfully set descriptors
[info] [adv_network_dpdk_mgr.cpp:653] Successfully set up TX queue 0/0
[info] [adv_network_dpdk_mgr.cpp:662] Successfully started port 0
[info] [adv_network_dpdk_mgr.cpp:665] Port 0, MAC address: A0:88:C2:B4:8A:1F
[info] [adv_network_dpdk_mgr.cpp:677] Enabling promiscuous mode for port 0
[info] [adv_network_dpdk_mgr.cpp:960] Applying tx_eth_src offload for port 0
[info] [adv_network_dpdk_mgr.cpp:710] Setting up RX burst pool with 8191 batches
[info] [adv_network_dpdk_mgr.cpp:750] Setting up TX ring TX_RING_P0_Q0
[info] [adv_network_dpdk_mgr.cpp:776] Setting up TX burst pool TX_BURST_POOL_P0_Q0 with 10240 pointers at 0x7f60ff30e5c0
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_RX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_GPU unused in queues section
[info] [adv_network_dpdk_mgr.cpp:1052] Config validated successfully
[info] [adv_network_dpdk_mgr.cpp:1065] Starting advanced network workers
[info] [adv_network_dpdk_mgr.cpp:1251] Starting TX Core 11, port 0, queue 0 socket 0 using burst pool 0x7f60ff30e5c0 ring 0x7f60fbcb3f80
[info] [adv_network_dpdk_mgr.cpp:1106] Done starting workers
[info] [gxf_executor.cpp:1842] Activating Graph...
[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
Hi @matin-karimpour can you please grab the latest ANO code and try again? We had a bad merge recently that caused duplicate initialization to happen.
I forgot to answer this question:
Another question that has come up for me is, at what stage of development is TCP for ANO?
That's a good question. TCP is typically never used for extremely high-rate traffic because of the large overhead in processing. There are exceptions to this, but in general, UDP is preferred to reduce the baggage TCP brings with it. You would not want a TCP stack running on the GPU, so you'd have to bifurcate the header and data into CPU and GPU memory such that there would be a large amount of synchronization and reassembly happening, retransmits, etc. For applications that need very high throughput with reliability you'd typically use something like RoCE which is built on top of UDP. This works for both CPU and GPU.
We have plans to integrate RoCE into the ANO using the same or a very similar API that exists today. If this is of interest to you please open an issue and we can track the progress.
Hi @matin-karimpour can you please grab the latest ANO code and try again? We had a bad merge recently that caused duplicate initialization to happen.
Hi @cliffburdick, thank you for your response. I have the same problem and it doesn't solve.
Thanks. I'm setting up the same test and will let you know the results.
Hi @matin-karimpour, can you please try to change the memory type to huge
in the config and turn off gpu_direct
on the TX side? Curious to know if packets start transmitting after that change.
Hi @matin-karimpour I pushed a few fixed here: https://github.com/nvidia-holoscan/holohub/pull/469
Can you please either try this PR or wait until it's merged tomorrow and try main
?
Hi @cliffburdick, thanks for the updates. I'll check it out tomorrow in the main
branch. I'll make sure to let you know about the result.
Hi @cliffburdick sorry for my absence i wasn't in good shape so I couldn't check your updates. I checked and ran the new version and faced the error below on the RX side:
mlx5_net: port 0 unable to allocate rx queue index 0
[critical] [adv_network_dpdk_mgr.cpp:611] rte_eth_rx_queue_setup: err=-12, port=0
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_TX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section
what do I need to do to solve this issue?
Hi @matin-karimpour , can you please paste the entire output? -12 is ENOMEM, so no memory is available to allocate. My guess is your GPU doesn't have enough BAR1 space or your CPU doesn't have enough hugepages.
Hi @matin-karimpour @cliffburdick , is this issue still ongoing?
I'm trying to run Advanced Networking Benchmark in doca container but I use dpdk config for GPUDirect after setting batch size 1 the below errors appeared:
[critical] [adv_network_dpdk_mgr.cpp:1512] Failed to get TX meta descriptor [critical] [adv_network_tx.cpp:74] Failed to get TX meta descriptor: 2 [error] [dpdk_bench_op_tx.h:192] Error returned from adv_net_get_tx_pkt_burst: 2
Does anyone know how I can fix this?