sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 136 forks source link

Error while setting up envt. #186

Closed bdevierno1 closed 4 years ago

bdevierno1 commented 4 years ago

Bug Report

Current Behavior After following steps in readme https://github.com/sdnfv/openNetVM/blob/master/docs/Install.md#3-set-up-environment Currently getting an error: cannot mmap memory. Prompted to specify a base virtual address.

Expected behavior/code Should see the NF display how many packets it is sending to itself.

Steps to reproduce cd examples/speed_tester ./go.sh 1 -d 1 -c 16000

Additional context/Screenshots

SpeedTestError
dennisafa commented 4 years ago

if you run the manager with -a 0x7f000000000 then it should work. im wondering if it might be worth setting this base_virtual address as the default, as every server ive run ONVM on has required it, and letting the user know if they would like to let DPDK decide or enter it manually. the change would have to be here: https://github.com/sdnfv/openNetVM/blob/master/onvm/go.sh#L49

kevindweb commented 4 years ago

Ya this should be the default, @bdevierno1 that would be a good first job for you to work on. I've always had to add it for the past almost year now when running the manager.

twood02 commented 4 years ago

is the -a argument necessary for running all NFs now? This problem only used to appear for more complicated NFs like the snort intrusion detection system.

have the rest of you been manually passing in this flag?

dennisafa commented 4 years ago

is the -a argument necessary for running all NFs now? This problem only used to appear for more complicated NFs like the snort intrusion detection system.

have the rest of you been manually passing in this flag?

It’s only necessary when the NF process maps its memory into huge-pages after the rte_eal_init call. This seems to be OS specific, I’m not sure how the NF proc decides to map its memory such that it conflicts with huge-pages. From what ive seen, It may only occur a percentage of the time, as the NF could map its memory into a non huge-page memory.

twood02 commented 4 years ago

@bdevierno1 can you provide more info on your setup? Was this using the ONVM cloudlab image with everything preinstalled or a clean image you were setting up yourself? Does this always happen to you, or just sometimes. (it's also possible to get this kind of unhelpful/misleading error message because of other problems like the manager crashing or not starting right)

Forcing a default -a argument would work, but it isn't a great solution because there is no way for us to know what address should be used. Here's a bit more explanation for what is happening here:

Normally when you run a program it has a virtual address space like on the left side of this diagram:

mem

The top of your process is the text area which stores code needed for your program. Below that will be the heap (marked data in the diagram) and stack. Note that this means that if you have a lot of code for your program, potentially your data/heap segment will need to start at a higher address (i.e., lower down in the diagram).

I'm not actually 100% sure where the shared memory regions are mapped into this picture (they aren't part of the stack or heap). I think they get loaded into the address space between the code and heap (data). As a result, if you have a network function with a lot of code (or a lot of dependent libraries that it loads into memory), then the start address where you can map things in needs to be higher. An NF will always try to map the shared memory into the same starting address as the NF Manager used (keep in mind these are virtual addresses, not physical, so every process can use any virtual address it wants). We want the NF Manager and the NFs to map things into the same virtual address because that way they can both treat things as if they are at the same location. This error occurs when the NF Mangager tries to map the shared memory into a low address, and then the NF starts but it already has some memory allocated at that spot (like the text segment) so it crashes when trying to map into the same place.

In the past, we only experienced this problem with our more complex NFs (like snort, which is way more complex than speed_tester). If this is happening consistently then we need a fix, and no matter what we might want to improve the error message if we have control over it.

pcodes commented 4 years ago

Suggestions from the meeting:

kevindweb commented 4 years ago

This should have been closed when Ben's #197 was merged