ovn-org / ovn-heater

Mega script to deploy/configure/run OVN scale tests.
Apache License 2.0
12 stars 12 forks source link

Openstack cloud tests #179

Closed mkalcok closed 11 months ago

mkalcok commented 1 year ago

Please review commit by commit. This branch is based on #175 so the first 7 commits by @fnordahl can be skipped in review.

This change builds on separation of CMS implementations in #175 and it adds Openstack CMS. More info is in commit messages but at current stage it can simulate provisioning of:

mkalcok commented 12 months ago

Thanks for the reviews @fnordahl @rjarry.

I don't see anything about the metadata agent here nor any southbound access to simulate what the ML2 driver does. Is there any plan to do so?

That's a good point. I focused on the NB because I assumed that that would be the primary database the Neutron would communicate with. I'll take a look at the SB as well. Do you have any pointers to relevant code/docs about this?

rjarry commented 12 months ago

There are a two parts to consider:

1) the neutron server access to the southbound db: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/scheduler/l3_ovn_scheduler.py

2) the metadata agent that runs on all compute nodes: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py

I am far from being an expert in this code but I know it has significant impact on scale because it multiplies the number of southbound db clients by two (1 ovn-controller + 1 neutron metadata agent per compute node).

For performance reasons, openstack deployments usually recommend enabling ovn-controller monitor_all=true to reduce cpu pressure on the southbound ovsdb-server. The same logic is also applied for neutron metadata agent. To properly do scale testing, it would be good for ovn-heater to simulate these connections and subscriptions.

dceara commented 12 months ago

There are a two parts to consider:

1. the neutron server access to the southbound db: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/scheduler/l3_ovn_scheduler.py

2. the metadata agent that runs on all compute nodes: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py

I am far from being an expert in this code but I know it has significant impact on scale because it multiplies the number of southbound db clients by two (1 ovn-controller + 1 neutron metadata agent per compute node).

For performance reasons, openstack deployments usually recommend enabling ovn-controller monitor_all=true to reduce cpu pressure on the southbound ovsdb-server. The same logic is also applied for neutron metadata agent. To properly do scale testing, it would be good for ovn-heater to simulate these connections and subscriptions.

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

felixhuettner commented 12 months ago

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

I think that should give us a fairly accurate measurement of the neutron-ovn-metadata-agent. it currently does not set monitor conditions, so we get a full db dump anyway. The only thing we are loosing is the performance difference between a relay and the neutron agent when receiving updates from the ovsdb; but i don't think that should matter too much

rjarry commented 12 months ago

@dceara: I think the neutron metadata agent is not read-only, it also writes to the southbound db:

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L193-L196

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L292-L293

felixhuettner commented 12 months ago

@dceara: I think the neutron metadata agent is not read-only, it also writes to the southbound db:

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L193-L196

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L292-L293

Yep, there is also some regular heartbeat, but most people configure it

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

I think that should give us a fairly accurate measurement of the neutron-ovn-metadata-agent. it currently does not set monitor conditions, so we get a full db dump anyway. The only thing we are loosing is the performance difference between a relay and the neutron agent when receiving updates from the ovsdb; but i don't think that should matter too much

Allthough it seems like there is now some kind of monitor conditions going on here, but only for chassis https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/ovsdb.py#L50-L53

mkalcok commented 12 months ago

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

felixhuettner commented 12 months ago

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

Thanks. Looks good

mkalcok commented 12 months ago

I'd say that this change is now almost ready for full review. Two things that are missing and I'd like your opinion on are:

fnordahl commented 11 months ago

I'd say that this change is now almost ready for full review. Two things that are missing and I'd like your opinion on are:

* Floating IPs and external connectivity: I'd be happy to do this in separate PR to not blow up scope of this one even more.

* ovn-metadata-agents: Is the consensus that we can simulate metadata agents with ovn-relay nodes to create "read" load on SB DB? Since as far as I can tell the only write operations of metadata agents are keepalive messages and updates to chassis external ID's on registration?

From my perspective I think it would be valuable to finalize this PR as-is now and mark it as ready for review once you feel comfortable with it.

The items discussed above can be worked on independently, and the work you have done so far would be a dependency.

felixhuettner commented 11 months ago

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

I'm not sure what has happend, but we seem to create multiple external networks again (even though they now seem to have no external connectivity)?

mkalcok commented 11 months ago

:facepalm: thank you for noticing @felixhuettner . I lost a commit that decoupled project/ext net creation when I was rebasing/force pushing local copy on a different machine

mkalcok commented 11 months ago

Lost change was recovered (thanks @fnordahl) and I think this PR is ready for review.

mkalcok commented 11 months ago

I pushed two small additions:

fnordahl commented 11 months ago

I'm working on a rebase after #180 merged.

fnordahl commented 11 months ago

Rebased and ready for review now @dceara @felixhuettner @rjarry

fnordahl commented 11 months ago

@booxter @dceara for when you have a spare moment, I believe all review feedback has been addressed.

dceara commented 11 months ago

@mkalcok @fnordahl There are a few conflicts due to merging the initial ovn-ic work; I'll try to help with the rebase.

dceara commented 11 months ago

I tried a rebase here (~completely untested yet~ barely tested): https://github.com/dceara/ovn-heater-1/commits/openstack-cloud-setup-rebase-tmp

fnordahl commented 11 months ago

I tried a rebase here (~completely untested yet~ barely tested): https://github.com/dceara/ovn-heater-1/commits/openstack-cloud-setup-rebase-tmp

Thanks, I'll have a look

fnordahl commented 11 months ago

@dceara thanks a lot for the help with the rebase. I put your proposals for adaption into the correct commits and attributed accordingly.

I redid the parts of the commits that mainly move code around to ensure no changes were lost.

I also arrived at a slightly different solution to calling the prepare_test method, hope you agree.

Looking forward to your review!

fnordahl commented 11 months ago

I just noticed that the rebase is one commit shorter than the previous iteration, and the reason is that I have unintentionally squashed "CMS Openstack: Add ability to simulate VMs" and "OS Test: Spawn VMs and run ping in base scenario".

However, the result does look fine to me as both commits kind of addressed the same thing in iterations, so perhaps we should just keep it this way? Thoughts @mkalcok ?

mkalcok commented 11 months ago

yeah, I'm fine with them being squashed. I separated them for easier review as one added logic to the CMS Openstack implementation and the other used that implementation in a test scenario, but they are closely related.