ovn-org / ovn-scale-test

Apache License 2.0
18 stars 22 forks source link

Add Openshift-like 'switch per node' bring-up scenario. #189

Closed dceara closed 4 years ago

dceara commented 4 years ago

Add new OVN scenario OvnNorthboundFakeMultinode which implements a test case that brings up and configures ovn-fake-multinode nodes (logical_switch, connect to logical_router, configure mgmt port, ping from mgmt port to default gateway).

Also:

Co-authored-by: Mark Michelson mmichels@redhat.com Signed-off-by: Mark Michelson mmichels@redhat.com Co-authored-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: Dumitru Ceara dceara@redhat.com

dceara commented 4 years ago

CC: @LorenzoBianconi @putnopvut

dceara commented 4 years ago

CC: @hzhou8, it would be great if you could have a look. Thanks!

hzhou8 commented 4 years ago

CC: @hzhou8, it would be great if you could have a look. Thanks!

Sorry for the slow reviewing. I was confused by the ssh closing. Please check @l8huang 's comment. Were you observing any ssh connection leak? How did it happen?

In addition, in the future (and in the past as well) I'd prefer one commit for one change instead of squashing many changes together. I know some repo's prefer one commit for each PR, but there can be multiple PRs in that case. From my perspective multiple commits in one PR is ok, but I am open to this. In either case, each change should be in a separated commit.

dceara commented 4 years ago

@l8huang @hzhou8, thanks for reviewing this!

CC: @hzhou8, it would be great if you could have a look. Thanks!

Sorry for the slow reviewing. I was confused by the ssh closing. Please check @l8huang 's comment. Were you observing any ssh connection leak? How did it happen?

We've been using ovn-scale-test lately to test large scale deployments and with one open connection per farm we ended up with lots of ssh sessions that live until the process exits. At some point we couldn't create more ssh sessions. However, we are testing with ovn-fake-multinode as "physical" deployment which essentially runs virtual nodes inside physical machines. I will look into reusing SSH sessions to only have one per physical machine.

In addition, in the future (and in the past as well) I'd prefer one commit for one change instead of squashing many changes together. I know some repo's prefer one commit for each PR, but there can be multiple PRs in that case. From my perspective multiple commits in one PR is ok, but I am open to this. In either case, each change should be in a separated commit.

Sure, will do.

Thanks, Dumitru

l8huang commented 4 years ago

At some point we couldn't create more ssh sessions.

what's the root cause of ssh session creation failure? Is it running out memory or file descriptor?

dceara commented 4 years ago

At some point we couldn't create more ssh sessions.

what's the root cause of ssh session creation failure? Is it running out memory or file descriptor?

Running out of file descriptors.

Error SSHError: Exception <class 'paramiko.ssh_exception.SSHException'> was raised during connect to <host>:22. Exception value is: SSHException('No existing session')

We could increase the file descriptor limit but then we also hit an issue in rally itself because it uses select internally and doesn't work with more than 1K FDs.

I think, however, I can address all this by reusing the SSH connection for all containers that are spawned on a physical machine. I'll try it out and update the PR when I have it working.

Thanks, Dumitru

dceara commented 4 years ago

@hzhou8 @l8huang As suggested, I split the PR in multiple commits (one per logical change). For now I left out the SSH connection changes.

I'll send a follow up PR once I have the code ready to reuse SSH connections for all containers spawned on the same physical machine. Until then we'll work around it on our setups by increasing the file descriptor limit when needed.

Thanks, Dumitru

dceara commented 4 years ago

@hzhou8 @l8huang Sorry, I had to force push the PR because I realized that in commit Add wait_sync=ping option for binding logical ports. I had forgotten to add an import time. This is the only change I made in the new version.

dceara commented 4 years ago

Sorry for the slow response. I finished the first 8 commit reviewing so far, so send the comments for now, and I will continue the rest 2 commits reviewing tomorrow.

@hzhou8, thanks for the review! I pushed a new version with your suggested changes, except for _get_gw_ip() for now. There was no other change to the rest of the commits.