sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 136 forks source link

Manager crash when no ports initialized #160

Closed dennisafa closed 5 years ago

dennisafa commented 5 years ago

Manager crashes if we try to send packets out on an uninitialized port.

Summary:

The function onvm_pkt_enqueue_port() will call onvm_pkt_flush_port_queue() which TX bursts packets out of the port with rte_eth_tx_burst(). If you run the manager with no ports and try to do this, it'll segfault on rte_eth_tx_burst(). This fix checks if the port is initialized before sending out packets by adding an array that tracks which ports are initialized.

Usage: Run manager with ./go.sh 0,1,2,3 0 0xf0 -s stdout -a 0x7f000000000 and basic_monitor with ./go.sh 1 -p 1 then run speed_tester with ./go.sh 2 -d 1 It should not crash anymore.

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements
Bug fixes X
New functionality
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

TODO before merging :

Test Plan:

Follow the usage guide, verify that manager does not crash anymore.

Review:

@koolzz @kevindweb

onvm commented 5 years ago

In response to PR creation

CI Message

Your results will arrive shortly

koolzz commented 5 years ago

What If you have 1 port enabled and you try to send out of 2? Looks like this doesn't fix that.

koolzz commented 5 years ago

Also slightly concerned about adding a check to the critical path, as this would occur every pkt send (out of the NIC). What does the pktgen perf difference look like on cloudlab nodes?

dennisafa commented 5 years ago

What If you have 1 port enabled and you try to send out of 2? Looks like this doesn't fix that.

hmm, yea good point. what if we initialize all port ID's to NULL (or -1) before we assign them for a quick check? will check out cloudlab perf. edit: might be a bit more complicated for ID's, ill think of a fix.

dennisafa commented 5 years ago

@onvm give me a run

pktgen is acting up for whatever reason, ill get the # in a bit

onvm commented 5 years ago

@onvm give me a run

pktgen is acting up for whatever reason, ill get the # in a bit

CI Message

Your results will arrive shortly

dennisafa commented 5 years ago

some pktgen tests from cloudlab without the extra check:

Screen Shot 2019-08-21 at 4 22 44 PM

with the check:

Screen Shot 2019-08-21 at 4 23 48 PM

no notable performance regressions.

kevindweb commented 5 years ago

@dennisafa this works, tested with your solution, and saw that develop does fail with this edge case. Pktgen is fine. My only thought is that it's pretty redundant to keep checking for this port. If an NF has been initialized, the port ID it is trying to access will not change through its life cycle. This means that if the NF tries to get 1 packets, or 1000000 packets, they will all fail. The manager has now wasted the time to check for that port an extra 999,999 times. I hope that makes sense, but there has to be a way to check at initialization of NF if the port is valid (tell it to change if invalid or something), rather than let the NF attempt to enqueue packets forever, consuming resources from manager

dennisafa commented 5 years ago

@dennisafa this works, tested with your solution, and saw that develop does fail with this edge case. Pktgen is fine. My only thought is that it's pretty redundant to keep checking for this port. If an NF has been initialized, the port ID it is trying to access will not change through its life cycle. This means that if the NF tries to get 1 packets, or 1000000 packets, they will all fail. The manager has now wasted the time to check for that port an extra 999,999 times. I hope that makes sense, but there has to be a way to check at initialization of NF if the port is valid (tell it to change if invalid or something), rather than let the NF attempt to enqueue packets forever, consuming resources from manager

the port to be sent out of is defined by the meta->destination value as seen here in basic_monitor i can imagine that NF's may want flexibility with the port that packets are bursted out of.

koolzz commented 5 years ago

@dennisafa Code looks good, although as @kevindweb pointed out it is a bunch of extra checks. Can we test with flow lookup disabled macro(I believe its still enabled by default) that should show our best perf ratio comparison. If that number is the same then I think we're clear to merge

dennisafa commented 5 years ago

@dennisafa Code looks good, although as @kevindweb pointed out it is a bunch of extra checks. Can we test with flow lookup disabled macro(I believe its still enabled by default) that should show our best perf ratio comparison. If that number is the same then I think we're clear to merge

Here's the perf with the develop branch, flow table disabled:

develop branch

And with this branch:

port_crash branch
koolzz commented 5 years ago

@onvm

onvm commented 5 years ago

@onvm

CI Message

Your results will arrive shortly