sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 136 forks source link

Invalid Port Prints when running with no ports enabled #139

Closed koolzz closed 4 years ago

koolzz commented 5 years ago

This issue describes a simple bug/improvement for the OpenNetVM code, and is a great way for YOU to get involved with improving this open source project! If you want to fix it, just create a pull request with the fix as detailed in the Contributing document. If you have questions, post here!


Bug Report

Current Behavior Currently if onvm_mgr runs with 0 ports and the Scaling or Speed Tester NF is used we will get an annoying print:

Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0

Expected behavior/code There is no reason for us to even parse the actual port, we should just have a fake mac and stuff.

Steps to reproduce onvm_mgr -> ./go.sh 0,1,2,3 0 0xf0 -s stdout -a 0x7f000000000 scaling_example_nf -> ./go.sh 1 -d 1 -n 3

Possible Solution Fake the mac if no ports enabled, or even fake the mac always

dennisafa commented 5 years ago

I'm having some trouble replicating this bug. Using latest version of dev branch, running manager with and speed_tester with the commands used above, I am getting no prints on either the manager or NF side. Where's the port parsing happening? Having trouble finding where that print statement is.

koolzz commented 5 years ago

@dennisafa If I recall correctly this print happens because when we run speed tester it tries to create packets using some mac addr from the ports array. The print would happen in the setup portion of speed tester launch.

koolzz commented 5 years ago

Let me know if you were able to replicate @dennisafa. Probably don't solve this though, it's reserved for first time contributors

dennisafa commented 5 years ago

Let me know if you were able to replicate @dennisafa. Probably don't solve this though, it's reserved for first time contributors

wasn't able to : ( not getting any invalid prints on either the mgr or NF. @kevindweb can you sanity check me?

kevindweb commented 5 years ago

I am instantiating a new cloudlab instance to get this again, but I have received this a lot actually, both in cloudlab and nimbus. I never looked into why, but I'll try to replicate.

kevindweb commented 5 years ago

I mean I know why this is happening. Not too difficult to replicate, basically you just run normal manager + speed_tester with no dpdk devices bound (easily produced with a new cloudlab node). I got it on my first try, then bound a device, it went away. I unbound that device, the error came back. This is just a dpdk issue. Inside /dpdk/lib I tried grep -nr -I . -e "Invalid port id %d" (using -I to ignore binary files, because they're not helpful)

./librte_kni/rte_kni.c:447:             RTE_LOG(ERR, KNI, "Invalid port id %d\n", port_id);
./librte_kni/rte_kni.c:467:             RTE_LOG(ERR, KNI, "Invalid port id %d\n", port_id);

Here's what we need to do, if the user doesn't have a DPDK bound interface, and they try to set a port, we should stop them, or at least show a warning. I tried digging into the dpdk code, but this solution would be easier than changing any actual logic.

rohit-mp commented 4 years ago

Hi,

Could I have a try at this?

Would adding extra checks in openNetVM/onvm/go.sh before starting the manager be a suitable place for this? I was thinking of adding a check to verify if the environment variable ONVM_NIC_PCI has been set first, and if yes, check if the kernel driver in use is one of the supported DPDK drivers as mentioned in the dpdk-devbind.py script.

Is there any simpler/better way of doing this?

kevindweb commented 4 years ago

Hi,

Could I have a try at this?

Would adding extra checks in openNetVM/onvm/go.sh before starting the manager be a suitable place for this? I was thinking of adding a check to verify if the environment variable ONVM_NIC_PCI has been set first, and if yes, check if the kernel driver in use is one of the supported DPDK drivers as mentioned in the dpdk-devbind.py script.

Is there any simpler/better way of doing this?

Sorry I didn't see this. That sounds like a great idea. We don't want to stop the user from running without a NIC everytime. We only want it to stop when the user tries to use a specific number of ports without them being available. If the user runs go.sh with 2 ports, both NICs should be bound for example.

rohit-mp commented 4 years ago

This is just a dpdk issue. Inside /dpdk/lib I tried grep -nr -I . -e "Invalid port id %d" (using -I to ignore binary files, because they're not helpful)

./librte_kni/rte_kni.c:447:             RTE_LOG(ERR, KNI, "Invalid port id %d\n", port_id);
./librte_kni/rte_kni.c:467:             RTE_LOG(ERR, KNI, "Invalid port id %d\n", port_id);

You seem to have missed an '=' in the grep statement. The actual output we get is:

Invalid port_id=0

The issue seems to be the call to rte_eth_macaddr_get while creating pkts (nf_setup) in scaling and speed_tester NFs.

https://github.com/DPDK/dpdk/blob/0da7f445df445630c794897347ee360d6fe6348b/lib/librte_ethdev/rte_ethdev.c#L2567-L2574

https://github.com/DPDK/dpdk/blob/acec04c4b2f5c75d244319e1d0ca17ea7d4da72d/lib/librte_ethdev/rte_ethdev.h#L1395-L1400

I think removing that call in the NFs and just faking a MAC address would be the best thing to do here. Would that be fine?

kevindweb commented 4 years ago

The tough thing is that we only want to do your suggestion if there is no nic bound. So we really want to do something like "try" to rte_eth_macaddr_get and if it fails, set a fake MAC address. This is because if it would have failed, it means there is no NIC bound. If there is a NIC bound, we might want to use -m <dest_mac_address> to send out, and therefore cannot use a fake MAC as the source.

We could do something in the example go.sh to stop speed_tester from using a destination MAC if there is no NIC bound? Not sure the best way to go about that though.

rohit-mp commented 4 years ago

Ah sorry didn't notice the -m option. It is actually possible to 'try' to get the mac address. Based on how the dpdk lib throws the error, we could check if a NIC is bound or not by:

if (rte_eth_dev_is_valid_port(0)) {
        rte_eth_macaddr_get(0, &ehdr->s_addr);
}

I tested this and I no longer get the Invalid port error output. We'd also need an else condition here to fake the MAC address if a NIC is not bound. I hope I'm not missing anything.

kevindweb commented 4 years ago

Yup, sounds like a great idea. The others may correct me if I'm wrong, but if we look at the rte_eth_macaddr_get uses in the repo, multiple examples use it, not just speed_tester/scaling. We might want to make a macro called onvm_get_macaddr for example that checks beforehand and if it fails, assigns the fake address? Instead of just changing it in speed_tester, we change all the occurrences in the repo.

The else statement should assign a dummy address, which can come from this article. I think any MAC with the prefix 02 is unassigned in the vendor prefix list.

rohit-mp commented 4 years ago

So, onvm_macaddr_get would check if a NIC is bound and if yes, return the MAC address, else return a fake safe MAC address? What would be a good place to add this?

kevindweb commented 4 years ago

Maybe onvm_nflib/onvm_common? We want it to be really quick because it's called in pkt_handler in load_balancer.c, which is why I suggested the macro.