openbaton / openstack-plugin

An implementation of an Open Baton plugin for Vim-Drivers interface
Apache License 2.0
4 stars 5 forks source link

Use Existing Networks? #8

Closed ftcjeff closed 7 years ago

ftcjeff commented 7 years ago

I'm trying the iperf demo on an OpenStack VIM. When I specify an existing network in the NSD the orchestration doesn't go anywhere and it's not clear from the logs what went wrong. When I specify a "new" network in the NSD it creates the network, but no instances are created. When I look at the graph, HUNDREDS of clients and servers are rendered. Again, nothing in the logs to tell me what's going on. In our environment, the networking relies on DPDK and vlan tagging, so we have to configure that carefully. I suspect what is happening is the instances fail immediately with the "No valid hosts found" message from Nova because the network created by OpenBaton is vxlan and doesn't support DPDK, vlan tagging and a valid physical network... I need the NSD to be able to use that specific network, pull floating IPs from that pool, and use a specific flavor. Is there a way to do this? The flavor selection seems to work okay, so what I'm really curious about is the networking side.

mpauls commented 7 years ago

Please provide us with the log files of the NFVO (usually /var/log/openbaton/openbaton.log) and the openstack-plugin log (usually in the NFVO folder /opt/openbaton/nfvo/plugin-logs). If not yet done, you should set log level of the NFVO to DEBUG.

gc4rella commented 7 years ago

hi @ftcjeff using the existing should work, and we did not experience such issue before. If you could provide us the log of the plugin and the orchestrator would be very helpful. The second case, yes, network management has been very basic from the very beginning, also because the initial version of the ETSI MANO specification did not provide a large set of attributes for passing down more requirements. Anyway, we had yesterday an intensive release 4 meeting (planned for April 2017), and we decided that the networking part will be the first thing to improve for this upcoming release. In addition, as we need to support such requirements a bit earlier for the ETSI NFV Plugtest, we are analysing the possibility of a minor upgrade for Release 3 in order to support multiple network types (in addition to vxlan). It would be helpful if you could provide us the request body of the neutron requests (using neutron --debug) you do for creating such networks, so that we could extract those parameters and see where to put them on the descriptors. A temporary "workaround" would be to modify the plugin for your specific needs...maybe giving it a different type so that you could still use it in parallel with the one we provide by default. Not the best solution, but it may work for the moment.

ftcjeff commented 7 years ago

Thanks @mpauls and @gc4rella. I will spin the VM back up and get those logs for you as well as the neutron info. I'll see what info I can get for you with neutron without having to re-build the network since others are using that cluster as well. We don't need a special build, but I might play around with the nightly packages as you add support for more network types.

ftcjeff commented 7 years ago

Hey good news! When I was looking at logs previously I was only looking in the openstack plugin log, not the openbaton log. In the openbaton log I saw this:

Infinite quota are not allowed. Please set nfvo.vim.drivers.allowInfiniteQuota to true or change the quota in your VIM installation

I updated the project on my VIM to remove the "-1" quotas and now it looks like I have an iperf client and server running. The instances are up and active anyway! So it looks like the problem was quota-related and not networking related.

Would be nice to see messages like that bubble to up the UI, but at least I'm past this problem.

Thanks!

ftcjeff commented 7 years ago

It finally failed with: ERROR:java.lang.RuntimeException: no ems for hostame: iperf-server-975 (and same for client). I'll try to track this down. I'll still get you the logs from my system. Just wanted to play around with it for a bit first and then will sanitize the logs.

gc4rella commented 7 years ago

You need to check the IP you put as rabbitmq endpoint on the nfvo and make sure that IP is reachable from the VMs you are deploying. The ems tries to reach that IP. I'm now only mobile and can't give you more hints.

ftcjeff commented 7 years ago

Here are my logs

openbaton.zip

ftcjeff commented 7 years ago

@gc4rella okay, I'll give that a shot.

ftcjeff commented 7 years ago

After setting the rabbit ip, it got a LOT further! However, I ultimately got this:

2016-12-01 22:39:12.544 INFO 11258 --- [SimpleAsyncTaskExecutor-1] org.openbaton.nfvo.vnfm_reg.VnfmManager : Executing Task errorTask for vnfr iperf-server. Cyclic=false 2016-12-01 22:39:12.547 ERROR 11258 --- [OpenbatonTask-5] o.o.n.v.tasks.abstracts.AbstractTask : ERROR from VNFM: java.lang.RuntimeException: no ems for hostame: iperf-server-748 2016-12-01 22:39:12.582 ERROR 11258 --- [OpenbatonTask-5] o.o.n.v.tasks.abstracts.AbstractTask : ERROR for VNFR: iperf-server 2016-12-01 22:39:24.061 INFO 11258 --- [SimpleAsyncTaskExecutor-2] org.openbaton.nfvo.vnfm_reg.VnfmManager : Executing Task errorTask for vnfr iperf-client. Cyclic=false 2016-12-01 22:39:24.065 ERROR 11258 --- [OpenbatonTask-6] o.o.n.v.tasks.abstracts.AbstractTask : ERROR from VNFM: java.lang.RuntimeException: no ems for hostame: iperf-client-552 2016-12-01 22:39:24.089 ERROR 11258 --- [OpenbatonTask-6] o.o.n.v.tasks.abstracts.AbstractTask : ERROR for VNFR: iperf-client

ftcjeff commented 7 years ago

A-ha. The DNS wasn't being set on my openstack instances. Once I got that set up through neutron, it looks like this issue is solved. I think if EMS isn't on the image, it tries to download one from the net using hostnames. This wasn't working for me because DNS wasn't set up. The instances have been running for about 10 minutes now and are still showing ACTIVE. Good news!

ftcjeff commented 7 years ago

Just to close on this, I logged into the client instance and ran iperf by hand and here's what I got.

ubuntu@iperf-client-279:~$ iperf -c 10.10.10.17

Client connecting to 10.10.10.17, TCP port 5001 TCP window size: 45.0 KByte (default)

[ 3] local 10.10.10.16 port 38113 connected with 10.10.10.17 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 7.34 GBytes 6.31 Gbits/sec

gc4rella commented 7 years ago

A-ha. The DNS wasn't being set on my openstack instances. Once I got that set up through neutron, it looks like this issue is solved. I think if EMS isn't on the image, it tries to download one from the net using hostnames. This wasn't working for me because DNS wasn't set up. The instances have been running for about 10 minutes now and are still showing ACTIVE. Good news!

That's great! yes, after booting the VM tries to download the EMS from our apt repo. We are going to provide an alternative solution for release 4, but if you install the EMS in the image, then you don't have anymore such problems. We have tested this setup with complete offline VMs, and we did not have issues.

Just to close on this, I logged into the client instance and ran iperf by hand and here's what I got. ubuntu@iperf-client-279:~$ iperf -c 10.10.10.17

Client connecting to 10.10.10.17, TCP port 5001 TCP window size: 45.0 KByte (default)

[ 3] local 10.10.10.16 port 38113 connected with 10.10.10.17 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 7.34 GBytes 6.31 Gbits/sec

There should be a screen running also on client side showing that it managed to connect to the server side automatically after deployment.

btw I remember you had some problems with the plugin in the past. Would it be possible to share some feedbacks about what was the problem and how you solved it?

ftcjeff commented 7 years ago

Hi @gc4rella - thanks for the response. There were problems that I could solve and problems I couldn't. However, most of them took a LOT of work to figure out. It would be nice if better error messages could bubble up to the UI. Most times, the information was "somewhere", but what shows up on the screen is just an error code with a message that something didn't work. The other problem is that the "somewhere" wasn't always obvious. For example, when there was a problem with OpenStack quotas, the error message was in openbaton.log, not over in the /plugins logs.

This came up a few times. For example, when I tried to connect to an existing network (as described above in this thread), the problem was that the quotas were set to -1. But it wasn't obvious at all from the UI or openstack logs. So I spent 1/2 day trying to figure out what was wrong with my networking, using new networking, etc. Once I realized it was the quotas, I was up and running in about 30 seconds.

The issue about DNS wasn't obvious at AL until I looked through the log file and searched for ems. I found where it was trying to do the wget and knew immediately what the problem was because I hadn't bothered to set up DNS on this cluster. Once I got that going, again, it was working within seconds.

Both of these situations could be solved very simply by promoting error messages up to the UI or by checking name resolution pretty early in the process. This may not be an issue for production clouds, but for lab clouds who knows... it might save someone some time.

I was never able to connect using an https:// entrypoint to keystone. I never got around that, so I had my admin re-build my cloud without SSL support. This is just a lab cloud, so it's not a huge deal.

I like the changes you made to the VIM creation page. I think it's the right direction (helping the user fill in the blanks instead of editing json). However, I still think the choices of "create" vs. "upload" should be split. It's kind of weird to see them on the same page and it makes you think you're doing something wrong.

It wasn't difficult editing the NSD's json, but there's a lot of duplication in there. It's easy enough to global search and replace, but when you do that you always feel like you're going to squash something important. So maybe having references might be another way to solve that. Define the network to use in one place and then refer to it from other places. Same with the flavor. I guess you could use different flavors for different things, but then the user would know that just wouldn't use a reference.

Anyway, I really appreciated your help during this process!

gc4rella commented 7 years ago

Hi @ftcjeff , thx a lot for all your feedbacks. Indeed those are aspects we are going to work on for next release.