theopenlab / openlab

Used for general work tracking, centralized reporting, easier third-party integration for metrics and data, and more.
Apache License 2.0
5 stars 1 forks source link

Nova NUMA CI #200

Closed notartom closed 5 years ago

notartom commented 5 years ago

If you are interested in testing and improving support for the cloud-related SDKs/Tools as well as platforms in OpenLab, please fill out the details below. You can always find more information about OpenLab at https://openlabtesting.org

What is your focus?

If this is for an open source project what is it?

OpenStack Nova

Brief project description

OpenStack Nova provides a cloud computing fabric controller.

Is project code 100% open source? If so, what is the URL or URLs where it is located?

Yes, https://github.com/openstack/nova

What kind of machines (VMs or Baremetal) and how many do you expect to use?

We would need either 2 VMs with nested virtualisation or physical machines. Each would need multiple NUMA nodes, and one needs more NUMA nodes than the other. Hugepages need to be enabled on both machines, though their size isn't important (1G or 2M is fine). For reference, the current environment that I use to develop and test the feature has:

Controller/allinone VM:

[artom@allinone ~]$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0 1
node 0 size: 5816 MB
node 0 free: 1149 MB
node 1 cpus: 2 3
node 1 size: 5905 MB
node 1 free: 798 MB
node 2 cpus: 4 5
node 2 size: 3936 MB
node 2 free: 56 MB
node distances:
node   0   1   2 
  0:  10  20  20 
  1:  20  10  20 
  2:  20  20  10 

Compute VM:

[artom@compute ~]$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 7808 MB
node 0 free: 943 MB
node 1 cpus: 2 3
node 1 size: 7850 MB
node 1 free: 4594 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

What OS are you planning to use?

Fedora 29? Honestly it doesn't really matter as long as it can run devstack reliably. If nested virt ends up being used, the baremetal host will need good nested virt support, in which case F29 is probably the best choice, as Ubuntu seems to have issues. Vexxhost have reported reliable nested virt with Centos 7 as well.

Any special network configuration you expect or anticipate implementing?

N/A

Any architecture or other specifications/requirements (CPU, RAM, GPU, etc)?

See section 4.

What testing are you planning to implement or need assistance implementing?

NUMA live migration testing, and more generally re-introduce a NUMA CI to fill the hole left by the de facto abandonment of the Intel NFV CI.

To that end, I intend to use a tempest plugin (that I contribute heavily to) that will allow me to assert things that are outside of Tempest's scope. The actual test run will look like a normal Tempest test run.

How will this testing advance application and/or tooling built on-top of open infrastructure?

This will allow Nova to proceed more confidently with any NUMA-related feature and/or bug.

Will you publish blog or paper from your testing?

The intent is for this to eventually become a voting job in Nova.

Any other relevant details we should know about while preparing the infrastructure?

N/A

dims commented 5 years ago

LGTM

notartom commented 5 years ago

I said 2 machines, and for the initial PoC that's all I need, but I didn't consider the scale implications of eventually running a voting Nova CI job on this. I'm guessing 2 machines won't be enough, but I don't currently know how many would actually be needed. It's something that I'll need to research.

A balance will also have to be struck between good coverage and hardware requirements - but we can cross that bridge when we get there.

kiwik commented 5 years ago

+1, looks standalone physical machines better for this case as third part CI for OpenStack infra, @mrhillsman would you please allocate two physical machines for @notartom from you resource pool?

mriedem commented 5 years ago

We (nova) likely don't need a 3rd party CI job that runs on every proposed nova change like we have in the check queue today. Starting with a periodic job would be good enough IMO, or something that we can run on demand like the "experimental" queue in OpenStack upstream CI would be a great start.

stmcginnis commented 5 years ago

LGTM

mrhillsman commented 5 years ago

After discussing with @mriedem, it makes sense to have OpenLab trigger on the comment "check openlab" vs. each PR or even periodic for now.

notartom commented 5 years ago

Actually, after talking with @SeanMooney, we might want to expand the scope of this slightly, and make it more of a NFV CI than just NUMA CI. For that the servers would need SRIOV-capable network cards, and they've have to be baremetal.

mrhillsman commented 5 years ago

ack @notartom

notartom commented 5 years ago

One thing I'd like to avoid is having to become a sysadmin for an OpenStack cloud, so I'm wondering how the hardware (assuming we get it) is going to be presented. Ideally it would be in the shape of an OpenStack cloud that we could just point Nodepool towards. If they're going to be just machines with SSH access I'm not sure we have the human resources to run and maintain a cloud on them.

mriedem commented 5 years ago

But @notartom the boxes are already en route to your house?!

notartom commented 5 years ago

But @notartom the boxes are already en route to your house?!

Couple of things:

  1. For some reason I first read your name as 'dsrinivas' and went asking dims on IRC if he was messing with me. I now understand his lack of replies.
  2. How do you know my house address?
  3. Thanks for the boxes! Is bitcoin mining still a thing?
dims commented 5 years ago

@notartom Matt is kidding with you! :)

dims commented 5 years ago

/unassign

kiwik commented 5 years ago

No activity, feel free to reopen it.