sdnfv / openNetVM

A high performance container-based NFV platform from GW and UCR.
http://sdnfv.github.io/onvm/
Other
263 stars 136 forks source link

Pktgen for CI #148

Closed kevindweb closed 5 years ago

kevindweb commented 5 years ago

Finally have Pktgen running for our Continuous Integration in the nimbus cluster!

Summary:

Long awaited, and tested recently in another PR, the default run mode is now to run Pktgen for our base performance testing. Right now here's the basic steps, ci sends our worker node to reboot, then fires the worker script. This script, depending on the worker-config, runs a certain mode (right now just pktgen or speed_test). If pktgen mode (MODE="0"), then we run a script that calls the Pktgen openNetVM-Scripts/run-pktgen.sh script in the other node through paramiko's SSHCLient. That uses our new lua script that sends packets for 30 seconds. We retrieve the data from basic monitor, send it back to CI for analysis. The new worker script also allows for multiple run modes, (MODE="0 1" for example). This way, we can potentially run pktgen then speed_tester and get all the results back to back after reboot.

Usage:

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements
Bug fixes
New functionality 👍
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

TODO before merging :

Test Plan:

We have to figure out how we want to do different modes, from Github comment parsing or something. This way, we can stress test that this works. We should also test that this works on nn30 with nn33 (pktgen) so we know it's scalable to new nodes we want to use.

Review:

@koolzz @dennisafa

kevindweb commented 5 years ago

@onvm you there now?

onvm commented 5 years ago

@onvm you there now?

CI Message

Your results will arrive shortly

onvm commented 5 years ago

@onvm you there now?

CI Message

Error: ERROR: Script failed on nimbnode17

kevindweb commented 5 years ago

@onvm you know that was develop right?

onvm commented 5 years ago

@onvm you know that was develop right?

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

To be clear, the first "failure" was not on the Pktgen branch of nn44 ci. Nimbnode17 got hung on setting up the environment. CI was running on the develop branch both times it posted here. See the latest post on #147 for real pktgen results.

kevindweb commented 5 years ago

@onvm check pktgen please

onvm commented 5 years ago

@onvm check pktgen please

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

ci's got it! we have to figure out a good median for pktgen (maybe after merging the flow table macros...)

koolzz commented 5 years ago

@kevindweb I've merged #143, update this pr so its up to date. Also add the CI stuff(disabling flow table lookup here) + update our benchmarks for speed tester (just so we're floating roughly at 100%, as its currently always at 109%)

kevindweb commented 5 years ago

@koolzz thanks for the comments! I'll work on updates for these this weekend. As for the file restructuring, I mentioned I need to change the helper-functions because for example, manager.sh only really needs the run_linter, fetch_files, and print_header functions. Worker.sh however doesn't need run_linter or fetch_files, but needs print_header, build_onvm, and install_env. The only commonality is printing. For this reason, I'll make it more concise and possibly rename the file. It just made no sense to have a global helper function with no common functions. So I made a worker folder, which helps with scp (it's cleaner with all these new files), as well as organized a cluttered ci folder.

kevindweb commented 5 years ago

*edit, check_exit_code is also used, but should be different, because a worker node does not have access to post-msg.py, which causes errors occasionally.

kevindweb commented 5 years ago

@onvm test the changes, please

onvm commented 5 years ago

@onvm test the changes, please

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

I have to get a handle on how sporadic these nimbnode results are. @koolzz any suggestions? I just received when tested 40mil+, how could it be such a big difference?

koolzz commented 5 years ago

Are you using a different node to get these speed tester results? We were rather stable with speed tester tests before

kevindweb commented 5 years ago

These results are both from nimbnode17. I figured it made more sense to run pktgen and speed_test on the same node during a single CI run, is that not a good plan? I will look into why some runs are fast than others, though. Just weird because some runs are so much better than normal ~41 mil I've seen

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

@onvm let's see develop completely merged

onvm commented 5 years ago

@onvm let's see develop completely merged

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

@onvm it's difficult to run Pktgen when the link is down right?

onvm commented 5 years ago

@onvm it's difficult to run Pktgen when the link is down right?

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

@onvm can we try nn30?

onvm commented 5 years ago

@onvm can we try nn30?

CI Message

Your results will arrive shortly

onvm commented 5 years ago

@onvm can we try nn30?

CI Message

Error: ERROR: Failed to copy ONVM files to nimbnode30

kevindweb commented 5 years ago

@onvm why are permissions a problem?

onvm commented 5 years ago

@onvm why are permissions a problem?

CI Message

Your results will arrive shortly

onvm commented 5 years ago

@onvm why are permissions a problem?

CI Message

Error: ERROR: Failed to copy ONVM files to nimbnode30

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

onvm commented 5 years ago

Testing

CI Message

Error: ERROR: Failed to copy ONVM files to nimbnode30

kevindweb commented 5 years ago

@onvm with the new updates?

onvm commented 5 years ago

@onvm with the new updates?

CI Message

Your results will arrive shortly

onvm commented 5 years ago

@onvm with the new updates?

CI Message

Error: ERROR: Failed to fetch results from nimbnode17

kevindweb commented 5 years ago

@onvm that tiny error? thanks

onvm commented 5 years ago

@onvm that tiny error? thanks

CI Message

Your results will arrive shortly

onvm commented 5 years ago

@onvm that tiny error? thanks

CI Message

Error: ERROR: Failed to fetch results from nimbnode17

kevindweb commented 5 years ago

@onvm this shouldn't hang

onvm commented 5 years ago

@onvm this shouldn't hang

CI Message

Your results will arrive shortly

koolzz commented 5 years ago

@kevindweb Where the small things we discussed in the meeting updated? Ping me when its ready to merge (I think you're already working on mTCP so this is ready right?)

onvm commented 5 years ago

Testing

CI Message

Your results will arrive shortly

koolzz commented 5 years ago

@onvm Updated to latest version

kevindweb commented 5 years ago

This might or might not work @onvm

onvm commented 5 years ago

This might or might not work @onvm

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

@koolzz I added information about benchmarks to the README and changed the name of the helper script symlink

koolzz commented 5 years ago

@onvm Olá

onvm commented 5 years ago

@onvm Olá

CI Message

Your results will arrive shortly

kevindweb commented 5 years ago

@onvm did the changes mess things up?