sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
261 stars 134 forks source link

Questions about scaling example #294

Open strongcourage opened 3 years ago

strongcourage commented 3 years ago

Hi there,

I have several questions concerning the scaling example. When I rerun the openNetVM manager using the following commands onvm/go.sh -k 1 -n 0xF8 -s stdout -m 0,1,2, and run the NF scaling_example ./start_nf.sh scaling_example 1 -d 2 -n 3 -a with 3 children's using advanced rings interface, tx_drop is 128 at the beginning for all instances. I guess because I have not properly flushed all packets in tx_batch in previous run.

Screenshot 2021-06-16 at 10 09 07

In addition, I replay a pcap file and send 39580755 packets to port 0. In this case, the number of packets received by openNetVM (39397742 packets) or processed by all 4 NFs (39398254 packets) is smaller than the total number packets sent. So, what are the reason of this problem?

Screenshot 2021-06-16 at 10 09 56

Thanks.

twood02 commented 3 years ago

The tx_drop issue is because each scaler NF tries to create 128 packets at startup and send them to whatever NF you set as the destination in the command line. If that NF doesn't exist or hasn't fully started, then those packets will be counted as tx_drop because they can't be sent out. If you don't want your scaler NFs to create their own packets (since you are running a generator) you could set DEFAULT_PKT_NUM=0.

This is also what is giving you strange total packet processing numbers. Normally the packets processed by NFs would be less than the total RX on the host, but here it is higher because of the extra 128 packets generated by your 4 scaler NFs. (39397742+128x4 = 39,398,254)

If we compare the packet generator vs RX numbers we find you are missing 39580755-39397742 = 183,013 packets. These are being dropped before ONVM receives them from the NIC, so either they were dropped at the generator, at a switch between the generator and ONVM, or by the NIC because ONVM's RX thread couldn't read the packets fast enough. Are you sending the packets at 10Gbps? Are they small (64byte) packets? Depending on your CPU speed, ONVM's RX thread may be too slow. One way to make it run faster is to disable the flow table lookup that the manger performs on every packet (look for the FLOW_LOOKUP macro in the manager code)

twood02 commented 3 years ago

Let us know if you still have any questions or you can close this issue if that resolves things for you.

strongcourage commented 3 years ago

@twood02 Thank you for detailed answers. For packet generator, I don't know why I can't send the packets using PktGen on openNetVM (actually, it worked on my VMs, but failed on 2 servers with a cable). I count the number of processed packets of NFs with sID 1 via tx_drop as I forward packets to another NF with sID 2 that is not existed.

The pcap files I've used (https://tcpreplay.appneta.com/wiki/captures.html) are big, and when I sent the packets at 10Gbps, the number of dropped packets is big even with multiple instances. The problem is that (tx_drop + rx_drop) is smaller than the total number of packets sent. Thanks for your suggestions, I'll try to disable the flow table lookup and test again.

strongcourage commented 3 years ago

I want to increase the number of RX/TX threads (by default 1) in onvm_mgr/onvm_init.h, however I get the following error:

Port 0 init ...
Port 0 socket id 0 ...
Port 0 Rx rings 2 ...
Port 0 Tx rings 65535 ...
Port 0 modified RSS hash function based on hardware support,requested:0x3efbc configured:0x6eb8
Number of TX queues requested (65535) is greater than max supported(1024)
EAL: Error - exiting with code: 1
  Cause: Cannot initialise port 0

How can I do it? Thank you.

strongcourage commented 3 years ago

I've run openNetVM manager on a powerful machine with 44 cores and I've already disabled the flow table lookup. However, I still face the problem of missing packets received by the manager. Please see the figure below when I replay bigFlows.pcap with 1Gbps, the manager RX only received 434295 pkts out of 791616 pkts sent. Do you have any ideas on how to get all packets on the manager? Thanks.

Screenshot 2021-06-17 at 17 13 02
twood02 commented 3 years ago

I think you must not be assigning the manager enough cores after adjusting the RX thread count parameter. It is trying to use 65535 TX queues which doesn't make sense - I think the unsigned int must have overflowed here: https://github.com/sdnfv/openNetVM/blob/master/onvm/onvm_mgr/onvm_init.c#L336

If you increase the ONVM_NUM_RX_THREADS macro you will also need to be sure to start the manager with more cores. We should adjust our code to detect the case where there aren't enough cores available and this overflow occurs.

Can you verify that you still see this kind of packet loss when you just run the manger by itself (no scaler NF)? You can just compare the port RX number in the manager against the number reported by your generator.

If you tell us what trace and load generator tool you are using we can try to reproduce this.

strongcourage commented 3 years ago

Hello again,

I replayed the bigFlows pcap file at 10Gbps. I also print out the number of received packets (dropped packets) of the NIC port. When I just run the manager by itself without any NFs, the manager can get all packets sent from another server on port 0.

Screenshot 2021-06-21 at 14 04 40

However, when I run 5 scaling NFs with 1 RX and 1 TX, the number of dropped packets of the NIC port is big. It seems that increasing the number of RX/TX threads can't solve this problem.

Screenshot 2021-06-21 at 13 56 21

I also developed a scaled version NF by replacing packet_handler_fwd() by a more complex function that processes incoming packets using Deep Packet Inspection techniques. I know this function has overhead, thus I need a scaled NF. Then, I measured the number of dropped packets of the NIC port and also of all NFs when increasing the number of running NFs. So, running more NFs will clearly reduce the number of dropped packets (via %nf_drop/nic_recv). However, the problem is that the number of dropped packets of the NIC port is still high even with more than 10 NFs. Again, increasing RX/TX threads will not solve this problem.

Screenshot 2021-06-21 at 14 19 19

Do you have any ideas on how to reduce the number of dropped packets of the NIC ports? Thanks.

twood02 commented 3 years ago

@strongcourage we will try to repeat this test as closely as we can and try to help.

The scaling NF example is mainly useful to show how to dynamically start more NFs from within a running NF (autoscaling). The other option is to simply start multiple instances of your NF from the command line and give them all the same service ID. Both of these options should give the same results, but the scaling NF threads are started in a slightly different way -- maybe that is causing a problem. You can try manually starting several NFs (e.g. bridge/basic monitor) with service ID 1 to see if that makes a difference in your performance results.

For maximum performance you'll also want to disable any per-packet or per-batch print statements.

strongcourage commented 3 years ago

@twood02 ,

cores = [0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28] sockets = [0, 1]

    Socket 0    Socket 1
    --------    --------

Core 0 [0] [1] Core 1 [2] [3] Core 2 [4] [5] Core 3 [6] [7] Core 4 [8] [9] Core 5 [10] [11] Core 8 [12] [13] Core 9 [14] [15] Core 10 [16] [17] Core 11 [18] [19] Core 12 [20] [21] Core 16 [22] [23] Core 17 [24] [25] Core 18 [26] [27] Core 19 [28] [29] Core 20 [30] [31] Core 21 [32] [33] Core 24 [34] [35] Core 25 [36] [37] Core 26 [38] [39] Core 27 [40] [41] Core 28 [42] [43]


- For sure, I've disabled all printf statements. Thanks.
twood02 commented 3 years ago

From CPU layout it looks like the even numbered cores are socket 0 and odd number cores are socket 1. I suggest you try to assign the manager and NF core bitmask to only allow even or odd numbered cores. Also, are these physical cores or do you have hyperthreading enabled? It is best to turn hyperthreads off.

strongcourage commented 3 years ago

Hi Tim,

As I understood, when an RX thread receives a packet from the NIC and I disable the flow table lookup, it sends packets and stores them in the RX queue of NF with service ID 1 which is my NF. Even when I increase the size of queue for NFs, 20 NFs only process a network traffic 9Gbps with 0 dropped packets (note that increasing increasing number of RX threads doesn't have any impact).

I'm thinking about this scenario: NF1 (sID 1 & iID 1) only receives/processes packets sent from RX thread 1, NF2 (sID 1 & iID 2) only receives/processes packets sent from RX thread 2, ... In this case, maybe increasing the number of RX threads in Manager and running multiple NF instances could improve the performance.

Does it make sense? Thanks.

twood02 commented 3 years ago

Have you tried adjusting the size of the MBUF pool? https://github.com/sdnfv/openNetVM/blob/master/onvm/onvm_nflib/onvm_common.h#L70

If your NFs are doing significant processing on each packet, then you may be running out of mbufs in the pool causing the RX thread to drop. Or do you also only see 9Gbps with no loss when your NFs don't do anything? Are your NFs in either case transmitting the packets out again, or just dropping them after they finish?

strongcourage commented 3 years ago

@twood02 Thanks for your answer. I've already tried to change the size of NUM_MBUF to 65535, but still had the same result.

If I only forward packets to the destination, it is super fast, 1 NF can handle 20Gbps with no loss (around 5 milion pps). For both cases (with/without packet processing), I always forward packets to next NF, like this example.

strongcourage commented 2 years ago

@twood02 What does PACKET_READ_SIZE mean? Why does nf_buf->counthave type uint16_t? Code in onvm_pkt_enqueue_nf().

Can you explain a bit how the manager sends packets to NF instances' buffers? As I understood, each NF instance has its own buffer, and packets are distributed to NF instances based on the RSS hash (similar to how packets from NIC are sent to RX threads of the manager).

Thanks.

JackKuo-tw commented 2 years ago

@strongcourage In my opinion, PACKET_READ_SIZE means the buffer size. So if the buffer reaches PACKET_READ_SIZE, it must flush the queue to prevent packet loss.

The manager uses rte_ring_enqueue_bulk(), this function belongs to DPDK, maybe you can take a look at the official document -- Ring Library.

twood02 commented 2 years ago

Scaling Example / Performance issue

NoahChinitz commented 2 years ago

@strongcourage Sorry for the wait, but here are the results of my tests that I ran as close to your comments as possible:

Scaling_Issue_Response.pdf

Please reply here if you have any questions. Thanks!

strongcourage commented 2 years ago

@NoahChinitzGWU Thanks for your experiments and your document. I think I had the same results when running multiple NF instances that only drop/forward packets at 10Gbps. However, I faced the performance problem as my NF has a complex functionality. I'm still trying to find a solution for this problem, as we've already developed a scaled version of our monitoring tool (with the same functionality on the same server) using DPDK techniques that can handle > 30Gbps without packet loss. Thus, we think/believe that running our tool as NF on the OpenNetVM platform could achieve the same results. Best.

NoahChinitz commented 2 years ago

@strongcourage Ok. I will run tests with more complicated functionality and get back to you.

NoahChinitz commented 2 years ago

@strongcourage Can you be more specific about what exactly your scaled NF is doing, I am trying to recreate your issue as closely as possible.

strongcourage commented 2 years ago

@strongcourage Can you be more specific about what exactly your scaled NF is doing, I am trying to recreate your issue as closely as possible.

@NoahChinitzGWU sorry for the late reply, thanks for your suggestion, but our tool is currently closed-source, so I think I can't send you the code.

Basically, our NF uses Deep Packet Inspection technique to process each incoming packet (e.g., extract header, get information about session), then sends stats or alerts in CSV format to a Web app. Actually, I invoked our own function packet_handler_fwd() that is much more complex than the original function in the scaling example. Thank you.