sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 135 forks source link

Memory Leaks #200

Closed kevindweb closed 3 years ago

kevindweb commented 4 years ago

Bug Report

This is long overdue, but we need to publish some fixes to memory leaks in the repo. I will outline 2 tools and how to use them for finding these leaks. We need someone to take the task of fixing these. I will be a secondary contributor, helping whoever steps up if they need it along the way. This is a great way of diving into the C codebase with a goal, albeit challenging. Please comment on this if you want to take on the task.

Current Behavior The two tools are cppcheck and valgrind. Cppcheck is the easier tool but is less informative. We can run this static analysis tool on c files. Run as such

cppcheck *.c

If we for example run this on onvm/onvm_nflib/onvm_config_common.c we get

$ cppcheck onvm_config_common.c
Checking onvm_config_common.c ...
[onvm_config_common.c:109]: (error) Memory leak: local_list
[onvm_config_common.c:290]: (error) Memory leak: new_argv
[onvm_config_common.c:413]: (error) Memory leak: service_id_string
[onvm_config_common.c:431]: (error) Memory leak: instance_id_string
[onvm_config_common.c:473]: (error) Memory leak: arg_size
[onvm_config_common.c:481]: (error) Memory leak: core_string
[onvm_config_common.c:481]: (error) Memory leak: arg_size
[onvm_config_common.c:488]: (error) Memory leak: core_string
[onvm_config_common.c:488]: (error) Memory leak: arg_size
[onvm_config_common.c:504]: (error) Memory leak: mem_channels_string

To help with testing, here is a script I made to run cppcheck on all the relevant C files in the repo.

#!/bin/bash

echo "Testing OpenNetVM *.c"

cd ~/openNetVM || exit 1

# grab all files
files=$(find . -type d \( -path ./dpdk -o -path ./tools/Pktgen -o -path ./onvm/lib \) -prune -o -type f \( -iname \*.c -o -iname \*.h \) -print)

for f in $files
do
        cppcheck $f
done

Since cppcheck only does static analysis, it cannot check how much memory is lost through this, and certainly cannot see more than malloc. DPDK Valgrind is an adaptation on Valgrind that goes through the DPDK library and can also check Huge pages, which Valgrind cannot do. Valgrind is more complex in that we need to run commands a specific way. First, compile everything with USER_FLAGS=-g -O0. This will make sure Valgrind can see the line numbers of the files. We cannot run inside ./go.sh, so we need to write echo before the command on line 86 of go.sh. This will allow us to see the entire command of the manager instead of running it. It should be something like this:

sudo /home/<Your Username>/openNetVM/onvm/onvm_mgr/x86_64-native-linuxapp-gcc/onvm_mgr -l 0,1,2,3 -n 4 --proc-type=primary --base-virtaddr=0x7f000000000 -- -p 0 -n 0xF0 -s stdout -v 1

Then we can run Valgrind-DPDK like this:

sudo valgrind --leak-check=full --show-leak-kinds=all --verbose --log-file=valgrind-out.txt /home/<Your Username>/openNetVM/onvm/onvm_mgr/x86_64-native-linuxapp-gcc/onvm_mgr -l 0,1,2,3 -n 4 --proc-type=primary --base-virtaddr=0x7f000000000 -- -p 3 -n 0xF0 -s stdout -v 1 -t 12

The previous command runs onvm with -t 12, which tells ONVM to stop running after 12 seconds. Valgrind can only check leaks if the program exits on its own, so this is a good way of constraining it. Keep in mind that there are flaws to this, as Valgrind cannot handle multi-processor applications. Under the hood, it moves all threads to a single core to watch the execution path easier. This can make it up to 100 times slower (from what I've read) than normal.

Expected behavior/code We expect to have no memory leaks, as our cleanup processes should be able to free all necessary data structures and threads allocated.

Steps to reproduce I've outlined how to use Valgrind and cppcheck via cli.

Environment

Possible Solution There are many memory leaks. Getting any of them fixed would be great. To start, we have a function called onvm_ft_free in onvm/onvm_nflib/onvm_flow_table.c. Valgrind shows us that onvm_ft_create is called, which creates a hash table and other variables. The cleanup method is never called anywhere in the code. This was probably an oversight but needs to be called somewhere so the data can be freed on the exit of the manager. As for calls to DPDK memory allocation, we should have a rte_free for every rte_malloc. This is certainly an oversimplification, but a start.

Additional context/Screenshots Please reach reference me on this issue or reach out if there are issues getting this setup working. There is a reason we have leaks, and that's because they're difficult to find and fix! This needs to be done, and there are many ways to get started, even if we don't fix all the leaks in one go.

Use this command to find specific words in all C files recursively. grep -nrw . --include \*.c -e "malloc"

bdevierno1 commented 4 years ago

Could you assign this to me @kevindweb ?