nvidia-holoscan / holohub

Central repository for applications and operators for Holoscan
Apache License 2.0
109 stars 70 forks source link

Advance Network Opeartor (ANO) Failed to get TX meta descriptor #457

Open matin-karimpour opened 2 months ago

matin-karimpour commented 2 months ago

I'm trying to run Advanced Networking Benchmark in doca container but I use dpdk config for GPUDirect after setting batch size 1 the below errors appeared:

[critical] [adv_network_dpdk_mgr.cpp:1512] Failed to get TX meta descriptor [critical] [adv_network_tx.cpp:74] Failed to get TX meta descriptor: 2 [error] [dpdk_bench_op_tx.h:192] Error returned from adv_net_get_tx_pkt_burst: 2

Does anyone know how I can fix this?

cliffburdick commented 2 months ago

Ho @matin-karimpour have you tried increasing the batch size? Typically you wouldn't use the ANO for a batch size of 1 since the efficiency is very bad. Can you please paste your whole config?

matin-karimpour commented 2 months ago

Hi @cliffburdick, When my batch size is 1000 I haven't any errors but it can't communicate with RX side. As I said before when I send packets to the IP of the RX side it can see them and in the result, counts them. sure here are my config files TX:

multithreaded: true
num_delay_ops: 32
delay: 0.1
delay_step: 0.01

scheduler:
  check_recession_period_ms: 0
  worker_thread_number: 5
  stop_on_deadlock: true
  stop_on_deadlock_timeout: 500

advanced_network:
  cfg:
    version: 1
    manager: "dpdk"
    master_core: 3
    debug: false    

    memory_regions:
    - name: "Data_TX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Default_RX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Data_RX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Default_RX_CPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000        

    interfaces:
    - name: data1
      address: 07:00.1
      tx:
        - queues:
          - name: "ADC Samples"
            id: 0
            batch_size: 10240
            split_boundary: 0
            cpu_core: 11
            memory_regions:
              - "Data_TX_GPU"
            offloads:
              - "tx_eth_src"         
    # - name: data2
    #   address: 0005:03:00.1           
    #   rx:
    #     - queues:
    #       - name: "Default"
    #         id: 0
    #         cpu_core: 10
    #         batch_size: 10240
    #         output_port: "bench_rx_out"
    #         memory_regions: 
    #           - "Default_RX_GPU"
    #       - name: "Data"
    #         id: 1
    #         cpu_core: 9
    #         batch_size: 10240
    #         output_port: "bench_rx_out"
    #         memory_regions: 
    #           - "Data_RX_GPU"
    #       flows:
    #         - name: "ADC Samples"
    #           action:
    #             type: queue
    #             id: 1
    #           match:
    #             udp_src: 4096 #12288
    #             udp_dst: 4096 #12288

bench_rx:
  split_boundary: false
  gpu_direct: true
  batch_size: 10240
  max_packet_size: 1064
  header_size: 64

bench_tx:
  eth_dst_addr: a0:88:c2:b4:89:97   # Destination MAC
  udp_dst_port: 4096                  # UDP destination port
  udp_src_port: 4096                  # UDP source port
  gpu_direct: true
  split_boundary: 0
  batch_size: 100
  payload_size: 1000
  header_size: 64
  ip_src_addr: 192.168.112.30          # Source IP send from
  ip_dst_addr: 192.168.112.74         # Destination IP to send to
  address: 0000:07:00.1

RX:

multithreaded: true
num_delay_ops: 32
delay: 0.1
delay_step: 0.01

scheduler:
  check_recession_period_ms: 0
  worker_thread_number: 10
  stop_on_deadlock: true
  stop_on_deadlock_timeout: 500

advanced_network:
  cfg:
    version: 1
    manager: "dpdk"
    master_core: 3
    debug: false

    memory_regions:
    - name: "Data_TX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Default_RX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Data_RX_GPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000
    - name: "Default_RX_CPU"
      kind: "device"
      affinity: 0
      access:
        - local
      num_bufs: 51200
      buf_size: 9000

    interfaces:
#    - name: data1
#      address: 07:00.1
#      tx:
#        - queues:
#          - name: "ADC Samples"
#            id: 0
#            batch_size: 10240
#            split_boundary: 0
#            cpu_core: 11
#            memory_regions:
#             - "Data_TX_GPU"
#            offloads:
#             - "tx_eth_src"

     - name: data2
       address: 07:00.1
       rx:
         - queues:
           - name: "Default"
             id: 0
             cpu_core: 10
             batch_size: 10240
             output_port: "bench_rx_out"
             memory_regions:
               - "Default_RX_GPU"
           - name: "Data"
             id: 1
             cpu_core: 9
             batch_size: 10240
             output_port: "bench_rx_out"
             memory_regions:
               - "Data_RX_GPU"
           flows:
             - name: "ADC Samples"
               action:
                 type: queue
                 id: 0
               match:
                 udp_src: 4096 #12288
                 udp_dst: 4096 #12288                          

bench_rx:
  split_boundary: false
  gpu_direct: false
  batch_size: 10240
  max_packet_size: 1064
  header_size: 64

bench_tx:
  eth_dst_addr: a0:88:c2:b4:8a:1f   # Destination MAC
  udp_dst_port: 4096                  # UDP destination port
  udp_src_port: 4096                  # UDP source port
  gpu_direct: false
  split_boundary: 0
  batch_size: 1
  payload_size: 1000
  header_size: 64
  ip_src_addr: 192.168.112.74          # Source IP send from
  ip_dst_addr: 192.168.112.30          # Destination IP to send to
  address: 07:00.1
cliffburdick commented 2 months ago

Hi @matin-karimpour, can you describe the problem in detail of what you're trying to do? My understanding is you have two separate nodes where one is TX and one is RX. If you use a batch size of 1000 do you see the ethtool counters increasing on the TX side, but don't see it on the RX side? If you use 1 can you describe what is different on ethtool -S?

matin-karimpour commented 2 months ago

I would like to collect data from sensors for fault detection, perform some preprocessing on the edge device, and then send the data to the main server for final processing. For this purpose, I need to process and transmit the data quickly, so I plan to use GPU direct for real-time transmission.

As a first step in working with GPU direct, I'm trying to run the Advanced Networking Benchmark (ANO) application without modifying the code. To do this, I need to establish a connection between a TX and a RX on two different servers. However, the RX is unable to receive data from the TX. I want to emphasize that I'm running the application with the default code and have only made the necessary hardware configuration changes. To see what's going on on the TX side I changed the batch size to 1 and got the above error.

To test the RX, I sent a few packets using the ping command, and the RX successfully received them. My question is, why is the RX able to receive data sent via the ping command and a Python script I wrote (which sends data without GPU direct), but unable to detect data from the sender in this application? I hope my explanation has been clear.

Another question that has come up for me is, at what stage of development is TCP for ANO?

matin-karimpour commented 2 months ago

Another thing that I realized is that the RX side after a short period doesn't work anymore. it stopped when printing the log below.

[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 5]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 6]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 8]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 9]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 7]

As shown in the traffic network graph, there is some initial traffic on the NIC, and the RX can successfully count this data. However, after a while, it stops receiving data and can no longer receive any, even though the TX side (python script) is consistently sending data. Could you explain why this is happening?

image

cliffburdick commented 2 months ago

Hi @matin-karimpour I've been out on vacation, so I apologize for the brief responses. The ANO sample app should work correctly on your system by only modifying the MAC/PCI address on both sides. You may have to modify the buffer size as well if your GPU has a small amount of memory, but that's not common and you should get a startup error if that's the case.

Can you paste both the TX and RX startup output? I want to make sure everything looks normal. Also, any details about your system (GPUs and NICs used), as well as how they're connected. Have you looked at ethtool -S on both sides to see what's leaving/coming?

matin-karimpour commented 2 months ago

Hi @cliffburdick , no problem at all! I hope you had a great vacation. Thanks for getting back to me. I using A4000 and Mellanox CX6 on both sides. My issue is not due to hardware configuration. when running the TX side ethtool -S doesn't change any stats values on both sides.

startup output for RX side:

[info] [main.cpp:32] Initializing advanced network operator
[info] [main.cpp:35] Using ANO manager dpdk
[info] [gxf_executor.cpp:247] Creating context
[info] [gxf_executor.cpp:1672] Loading extensions from configs...
[info] [adv_network_rx.cpp:40] AdvNetworkOpRx::initialize()
[info] [adv_network_common.h:544] Finished reading advanced network operator config
[info] [adv_network_mgr.cpp:35] Selecting DPDK as ANO manager
[info] [adv_network_dpdk_mgr.cpp:287] Attempting to use 1 ports for high-speed network
[info] [adv_network_dpdk_mgr.cpp:312] DPDK EAL arguments: adv_net_operator --file-prefix=nwlrbbmqbh -l 3,10,9 -a 07:00.1,txq_inline_max=0,dv_flow_en=1 
EAL: Detected CPU lcores: 24
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/nwlrbbmqbh/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:07:00.1 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7fdb4a000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7fdb2c000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7fdb0e000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7fdaf0000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb4a000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb2c000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdb0e000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdaf0000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7fdad2000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7fdab4000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7fda96000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7fda78000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdad2000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fdab4000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fda96000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7fda78000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:378] DPDK init (07:00.1) -- RX: ENABLED TX: DISABLED
[info] [adv_network_dpdk_mgr.cpp:388] Configuring RX queue: Default (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:427] Created mempool RXP_P0_Q0_MR0 : mbufs=51200 elsize=9768 ptr=0x17f304280
[info] [adv_network_dpdk_mgr.cpp:388] Configuring RX queue: Data (1) on port 0
[info] [adv_network_dpdk_mgr.cpp:427] Created mempool RXP_P0_Q1_MR0 : mbufs=51200 elsize=9768 ptr=0x17f309dc0
[info] [adv_network_dpdk_mgr.cpp:477] Setting port config for port 0 mtu:9384
[info] [adv_network_dpdk_mgr.cpp:567] Initializing port 0 with 2 RX queues and 0 TX queues...
[info] [adv_network_dpdk_mgr.cpp:583] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:593] Successfully set descriptors
[info] [adv_network_dpdk_mgr.cpp:610] Setting up port:0, queue:0, Num scatter:1 pool:0x17f304280
[info] [adv_network_dpdk_mgr.cpp:631] Successfully setup RX port 0 queue 0
[info] [adv_network_dpdk_mgr.cpp:610] Setting up port:0, queue:1, Num scatter:1 pool:0x17f309dc0
[info] [adv_network_dpdk_mgr.cpp:631] Successfully setup RX port 0 queue 1
[info] [adv_network_dpdk_mgr.cpp:662] Successfully started port 0
[info] [adv_network_dpdk_mgr.cpp:665] Port 0, MAC address: A0:88:C2:B4:89:97
[info] [adv_network_dpdk_mgr.cpp:677] Enabling promiscuous mode for port 0
[info] [adv_network_dpdk_mgr.cpp:687] Adding RX flow ADC Samples
[info] [adv_network_dpdk_mgr.cpp:710] Setting up RX burst pool with 8191 batches
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_TX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section
[info] [adv_network_dpdk_mgr.cpp:1052] Config validated successfully
[info] [adv_network_dpdk_mgr.cpp:1065] Starting advanced network workers
[info] [adv_network_dpdk_mgr.cpp:1116] Flushing packet on port 0
[info] [adv_network_dpdk_mgr.cpp:1137] Starting RX Core 10, port 0, queue 0, socket 0
[info] [adv_network_dpdk_mgr.cpp:1106] Done starting workers
[info] [adv_network_dpdk_mgr.cpp:1116] Flushing packet on port 0
[info] [dpdk_bench_op_rx.h:44] AdvNetworkingBenchDefaultRxOp::initialize()
[info] [adv_network_dpdk_mgr.cpp:1137] Starting RX Core 9, port 0, queue 1, socket 0
[info] [dpdk_bench_op_rx.h:67] AdvNetworkingBenchDefaultRxOp::initialize() complete
[info] [gxf_executor.cpp:1842] Activating Graph...
[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 5]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 6]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 7]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 8]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 9]

and TX startup output:

[info] [main.cpp:32] Initializing advanced network operator
[info] [main.cpp:35] Using ANO manager dpdk
[info] [gxf_executor.cpp:247] Creating context
[info] [gxf_executor.cpp:1672] Loading extensions from configs...
[info] [dpdk_bench_op_tx.h:80] AdvNetworkingBenchDefaultTxOp::initialize()
[info] [dpdk_bench_op_tx.h:114] Initialized 4 streams and events
[info] [dpdk_bench_op_tx.h:131] AdvNetworkingBenchDefaultTxOp::initialize() complete
[info] [adv_network_tx.cpp:43] AdvNetworkOpTx::initialize()
[info] [adv_network_common.h:544] Finished reading advanced network operator config
[info] [adv_network_mgr.cpp:35] Selecting DPDK as ANO manager
[info] [adv_network_dpdk_mgr.cpp:287] Attempting to use 1 ports for high-speed network
[info] [adv_network_dpdk_mgr.cpp:312] DPDK EAL arguments: adv_net_operator --file-prefix=nwlrbbmqbh -l 3,11 -a 07:00.1,txq_inline_max=0,dv_flow_en=1 
EAL: Detected CPU lcores: 32
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/nwlrbbmqbh/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:07:00.1 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9384 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7f68cc000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7f60a2000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7f6084000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7f6066000000 with 480460800 bytes (51200 elements @ 9384 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f68cc000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f60a2000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6084000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6066000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_dpdk_mgr.cpp:119] Changing buffer size to 9768 for alignment
[info] [adv_network_mgr.cpp:42] Registering memory regions
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_CPU at 0x7f6048000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Default_RX_GPU at 0x7f602a000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_RX_GPU at 0x7f600c000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:103] Successfully allocated memory region Data_TX_GPU at 0x7f5fee000000 with 500121600 bytes (51200 elements @ 9768 bytes)
[info] [adv_network_mgr.cpp:113] Finished allocating memory regions
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_TX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Data_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_GPU
[info] [adv_network_dpdk_mgr.cpp:177] Successfully registered external memory for Default_RX_CPU
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f6048000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f602a000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f600c000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:147] Mapped external memory descriptor for 0x7f5fee000000 to device 0
[info] [adv_network_dpdk_mgr.cpp:378] DPDK init (07:00.1) -- RX: DISABLED TX: ENABLED
[info] [adv_network_dpdk_mgr.cpp:477] Setting port config for port 0 mtu:64
[info] [adv_network_dpdk_mgr.cpp:486] Configuring TX queue: ADC Samples (0) on port 0
[info] [adv_network_dpdk_mgr.cpp:524] Created mempool TXP_P0_Q0_MR0 : mbufs=51200 elsize=9768 ptr=0x7f60ff304380
[info] [adv_network_dpdk_mgr.cpp:567] Initializing port 0 with 0 RX queues and 1 TX queues...
[info] [adv_network_dpdk_mgr.cpp:583] Successfully configured ethdev
[info] [adv_network_dpdk_mgr.cpp:593] Successfully set descriptors
[info] [adv_network_dpdk_mgr.cpp:653] Successfully set up TX queue 0/0
[info] [adv_network_dpdk_mgr.cpp:662] Successfully started port 0
[info] [adv_network_dpdk_mgr.cpp:665] Port 0, MAC address: A0:88:C2:B4:8A:1F
[info] [adv_network_dpdk_mgr.cpp:677] Enabling promiscuous mode for port 0
[info] [adv_network_dpdk_mgr.cpp:960] Applying tx_eth_src offload for port 0
[info] [adv_network_dpdk_mgr.cpp:710] Setting up RX burst pool with 8191 batches
[info] [adv_network_dpdk_mgr.cpp:750] Setting up TX ring TX_RING_P0_Q0
[info] [adv_network_dpdk_mgr.cpp:776] Setting up TX burst pool TX_BURST_POOL_P0_Q0 with 10240 pointers at 0x7f60ff30e5c0
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_RX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_GPU unused in queues section
[info] [adv_network_dpdk_mgr.cpp:1052] Config validated successfully
[info] [adv_network_dpdk_mgr.cpp:1065] Starting advanced network workers
[info] [adv_network_dpdk_mgr.cpp:1251] Starting TX Core 11, port 0, queue 0 socket 0 using burst pool 0x7f60ff30e5c0 ring 0x7f60fbcb3f80
[info] [adv_network_dpdk_mgr.cpp:1106] Done starting workers
[info] [gxf_executor.cpp:1842] Activating Graph...
[info] [gxf_executor.cpp:1874] Running Graph...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 2]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 0]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 1]
[info] [gxf_executor.cpp:1876] Waiting for completion...
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 4]
[info] [multi_thread_scheduler.cpp:299] MultiThreadScheduler started worker thread [pool name: default_pool, thread uid: 3]
cliffburdick commented 2 months ago

Hi @matin-karimpour can you please grab the latest ANO code and try again? We had a bad merge recently that caused duplicate initialization to happen.

cliffburdick commented 2 months ago

I forgot to answer this question:

Another question that has come up for me is, at what stage of development is TCP for ANO?

That's a good question. TCP is typically never used for extremely high-rate traffic because of the large overhead in processing. There are exceptions to this, but in general, UDP is preferred to reduce the baggage TCP brings with it. You would not want a TCP stack running on the GPU, so you'd have to bifurcate the header and data into CPU and GPU memory such that there would be a large amount of synchronization and reassembly happening, retransmits, etc. For applications that need very high throughput with reliability you'd typically use something like RoCE which is built on top of UDP. This works for both CPU and GPU.

We have plans to integrate RoCE into the ANO using the same or a very similar API that exists today. If this is of interest to you please open an issue and we can track the progress.

matin-karimpour commented 2 months ago

Hi @matin-karimpour can you please grab the latest ANO code and try again? We had a bad merge recently that caused duplicate initialization to happen.

Hi @cliffburdick, thank you for your response. I have the same problem and it doesn't solve.

cliffburdick commented 2 months ago

Thanks. I'm setting up the same test and will let you know the results.

cliffburdick commented 2 months ago

Hi @matin-karimpour, can you please try to change the memory type to huge in the config and turn off gpu_direct on the TX side? Curious to know if packets start transmitting after that change.

cliffburdick commented 2 months ago

Hi @matin-karimpour I pushed a few fixed here: https://github.com/nvidia-holoscan/holohub/pull/469

Can you please either try this PR or wait until it's merged tomorrow and try main?

matin-karimpour commented 2 months ago

Hi @cliffburdick, thanks for the updates. I'll check it out tomorrow in the main branch. I'll make sure to let you know about the result.

matin-karimpour commented 2 months ago

Hi @cliffburdick sorry for my absence i wasn't in good shape so I couldn't check your updates. I checked and ran the new version and faced the error below on the RX side:

mlx5_net: port 0 unable to allocate rx queue index 0
[critical] [adv_network_dpdk_mgr.cpp:611] rte_eth_rx_queue_setup: err=-12, port=0
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Data_TX_GPU unused in queues section
[warning] [adv_network_mgr.cpp:137] Extra MR section with name Default_RX_CPU unused in queues section

what do I need to do to solve this issue?

cliffburdick commented 2 months ago

Hi @matin-karimpour , can you please paste the entire output? -12 is ENOMEM, so no memory is available to allocate. My guess is your GPU doesn't have enough BAR1 space or your CPU doesn't have enough hugepages.

tbirdso commented 1 month ago

Hi @matin-karimpour @cliffburdick , is this issue still ongoing?