sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 135 forks source link

Memory #216

Closed bdevierno1 closed 4 years ago

bdevierno1 commented 4 years ago

Related to issues #200 and #201
Removed a few memory leaks. Replaced some libc function calls with the corresponding recommended dpdk function calls. Created new function to free allocated memory upon manager termination. Added if statements to error check after every allocation.

Summary:

Run manager as per usual. To test with valgrind, this is highlighted in issue #200 Note that it may take up to 20 mins to finish. Possibly much longer than that if you compile with line numbers.

Here are DPDK's recommendations DPDK highlights that replacing the libc functions will provide a performance benefit. In the tests that I ran, I did not see any significant changes in the packet speed. This is probably due to the function calls only being made once in the NF setup and not continuously being called in the NFs packet handler.

There are some libc allocations remaining. Noticeably in onvm config. These functions are called to parse the json config upon booting with a json. They are called before the initialization of dpdk in onvm_nflib_init() therefore using rte_malloc will not be possible nor will it have an impact on performance.

Usage:

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements
Bug fixes đź‘Ť
New functionality
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

TODO before merging :

Test Plan:

After: Memory definitely lost now zero.

Screen Shot 2020-05-31 at 10 50 40 AM

Before:

Screen Shot 2020-05-31 at 11 08 59 AM

Note that I dont believe valgrind is optimized very well for dpdk. Many of the existing 'errors' picked up by valgrind are caused during the Initialize the Environment Abstraction Layer (EAL) by DPDK function calls such as rte_eal_init(). I believe these can be ignored. Note that DPDK recommends using their cleanup function. However doing so caused errors in compile time: "is deprecated: Symbol is not yet part of stable ABI [-Werror=deprecated-declarations]"

Review:

@kevindweb

onvm commented 4 years ago

In response to PR creation

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

@onvm Test please.

onvm commented 4 years ago

@onvm Test please.

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

@onvm Test again please.

onvm commented 4 years ago

@onvm Test again please.

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

@onvm Test whether rte_calloc caused rate drop.

onvm commented 4 years ago

@onvm Test whether rte_calloc caused rate drop.

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

@onvm Test with first solution without rte_calloc please.

onvm commented 4 years ago

@onvm Test with first solution without rte_calloc please.

CI Message

Your results will arrive shortly

kevindweb commented 4 years ago

@bdevierno1 This is awesome! What did CI not like about your first commit? I haven't seen it that slow in a while. Also sorry I didn't see this before, but I don't think we should be changing lib/cJSON.c as we don't write anything in there. Every few months/to a year we pull from a different repository for that code and probably don't want to change memcpy->rte_memcpy as a result.

bdevierno1 commented 4 years ago

@bdevierno1 This is awesome! What did CI not like about your first commit? I haven't seen it that slow in a while. Also sorry I didn't see this before, but I don't think we should be changing lib/cJSON.c as we don't write anything in there. Every few months/to a year we pull from a different repository for that code and probably don't want to change memcpy->rte_memcpy as a result.

Cool that's fine. It took a few tries to figure out but it really does not like rte_calloc in main.c I'm going to test if there is any impact on performance when I call the function in other areas.

bdevierno1 commented 4 years ago

@onvm Test whether removing rte_calloc on libc as well impacts performance.

onvm commented 4 years ago

@onvm Test whether removing rte_calloc on libc as well impacts performance.

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

During the tests above I reverted all my changes to mgr/main.c and performed the speed test. Performance reverted back to 100%. Subsequently isolated problem further by only modifying lines which allocated memory and replaced them with rte_calloc. The drop in performance returned. I am not sure why the use of rte_calloc had such a drastic impact on performance when DPDK states it will have the opposite effect. As of now I have removed this from this PR but freeing of memory is still maintained.

bdevierno1 commented 4 years ago

@onvm Test

onvm commented 4 years ago

@onvm Test

CI Message

Your results will arrive shortly

kevindweb commented 4 years ago

@bdevierno1 Why did this one work? didn't this one fail CI previous rte_calloc? Great job though!

bdevierno1 commented 4 years ago

@kevindweb I tried memory padding first which on the test that I ran still resulted in the perf drop. DPDK has some examples of using the align parameter which I tried and worked. I think Dennis has some ideas of how we can improve locality further and allow us to remove the last remaining callocs in nflib.c and maybe see some performance gains.

dennisafa commented 4 years ago

A few thoughts that I have after looking into this more:

bdevierno1 commented 4 years ago

@onvm Test please.

onvm commented 4 years ago

@onvm Test please.

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

@onvm Test

onvm commented 4 years ago

@onvm Test

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

I was not getting a performance drop on my node but unfortunately this was not replicated on the benchmark node. The following are steps I have tried to update onvm_nflib.c to only use dpdk function calls.

If anyone has any ideas how we can improve this further please ping me on slack.

bdevierno1 commented 4 years ago

@onvm Test

onvm commented 4 years ago

@onvm Test

CI Message

Your results will arrive shortly

onvm commented 4 years ago

@onvm Test

CI Message

Error: ERROR: Failed to post results to GitHub

bdevierno1 commented 4 years ago

@onvm Try again please.

onvm commented 4 years ago

@onvm Try again please.

CI Message

Your results will arrive shortly

onvm commented 4 years ago

@onvm Try again please.

CI Message

Error: ERROR: Failed to post results to GitHub

bdevierno1 commented 4 years ago

@onvm Please post results.

onvm commented 4 years ago

@onvm Please post results.

CI Message

Your results will arrive shortly

onvm commented 4 years ago

@onvm Please post results.

CI Message

Error: ERROR: Failed to post results to GitHub

bdevierno1 commented 4 years ago

@onvm

onvm commented 4 years ago

@onvm

CI Message

Your results will arrive shortly

bdevierno1 commented 4 years ago

Okay looks like using rte_zmalloc made a difference. Ready for review @catherinemeadows @dennisafa

kevindweb commented 4 years ago

Can someone explain the difference between zmalloc and calloc? The dpdk doc entries looked quite similar so idk why zmalloc wouldn’t cause the problems calloc did

dennisafa commented 4 years ago

Can someone explain the difference between zmalloc and calloc? The dpdk doc entries looked quite similar so idk why zmalloc wouldn’t cause the problems calloc did

They're equivalent, but rte_calloc allows you to specify a num_items to allocate variable while rte_zmalloc does not. I don't think they should cause performance fluctuations either, but we've been having those with speed tester lately

twood02 commented 4 years ago

@catherinemeadows will mark this for approval since she reviewed it previously