metal3-io / baremetal-operator

Bare metal host provisioning integration for Kubernetes
Apache License 2.0
583 stars 254 forks source link

Support RAID setup for baremetal server #206

Closed furukawa3 closed 2 years ago

furukawa3 commented 5 years ago

We'd like to deploy baremetal server with RAID configuration. In order to setup/unset RAID, using vendor driver is necessary. This issue proposes a new yaml attribute to setup RAID with Fujitsu PRIMERGY server by using iRMC driver.

dhellmann commented 5 years ago

I wonder if it makes sense to fold RAID setup into the hardware profile? Or would we potentially want hosts that match the same profile to have different RAID configurations?

furukawa3 commented 5 years ago

Hi. I think we should have 1 RAID configuration with 1 bare metal server. So, I agree with your 1st proposal. Here is a my draft idea of hardware profile. Just adding "targetRAID".

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm1
spec:
  online: true
  bmc:
    address: libvirt://192.168.122.1:6230/
    credentialsName: bm1-bmc-secret
  bootMACAddress: 52:54:00:c3:40:2e
  targetRAID: 5
dhellmann commented 5 years ago

@dtantsur and @juliakreger do you have any thoughts on this?

dtantsur commented 5 years ago

Should this tie into the deploy templates work done for Train? We could just use a deploy template name here, that would cover RAID, BIOS and any other customization from deploy templates.

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm1
spec:
  online: true
  bmc:
    address: libvirt://192.168.122.1:6230/
    credentialsName: bm1-bmc-secret
  bootMACAddress: 52:54:00:c3:40:2e
  deployTemplates:
     - CUSTOM_RAID5
     - CUSTOM_BIOS_VMX

Of course, we'd need something to create deploy templates themselves from their JSON definition.

Reference: https://docs.openstack.org/ironic/latest/admin/node-deployment.html#deploy-templates

juliakreger commented 5 years ago

@dhellmann @furukawa3 I guess a target + a hardware profile would work, the potential issue that comes to mind is that raid configuration traditionally takes some details to achieve. See https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#raid-configuration-support

furukawa3 commented 5 years ago

I'm so sorry I was late reply!!

@dtantsur @juliakreger Long time no see since Denver PTG :)

Should this tie into the deploy templates work done for Train?

Yes, I hope so but I don't know the ironic image would update to Train in metal3. I think deployTemplates is helpful to define/configure. Taking BIOS and other customization into consideration, it's reasonable to use it. However, I'm just thinking which is better to use deployTemplates or hardware profile. I just updated the yaml definition with all RAID parameter.(Thanks Julia) As Dmitry said, should we use deployTemplate to support other configuration?(BIOS, other vendor's driver...)

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
 name: bm1
spec:
 online: true
 bmc:
   address: 192.168.122.1
   credentialsName: bm1-bmc-secret
 bootMACAddress: 52:54:00:c3:40:2e
 hardwareProfile:
   logical_disks:
       size_gb: MAX,
       raid_level: "0",
       controller: "PRAID EP420i (0)",
       physical_disks:
         - "0",
         - "1",
         - "2"
longkb commented 5 years ago

Hi folks,

@dtantsur : Thanks for your suggestion about using deployTemplates. However, I have taken an investigation about RAID and BIOS configuration[1][2], and these configuration are executed via clean step in Manual cleaning [3], not deploy step. So I think we cannot use deployTemplates for these configuration.

@juliakreger , @furukawa3 : I think target + hardware profile would make our YAML for BaremetalHost become bulky.

So I would like to propose a new property named cleanSteps for these configuration, which point to ConfigMap object that include these configuration. And our YAML file should look like:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm0
spec:
  online: true
  bmc:
    address: libvirt://192.168.122.1:6230/
    credentialsName: bm0-bmc-secret
  bootMACAddress: 52:54:00:b7:b2:6f
  cleanSteps: cm-bm0

The ConfigMap object should look like:

apiVersion: v1
data:
  configSteps: |
    [
       {
          "interface":"bios",
          "step":"apply_configuration",
          "args":{
             "settings":[
                {
                   "name":"hyper_threading_enabled",
                   "value":"false"
                }
             ]
          }
       },
       {
          "interface":"raid",
          "step":"create_configuration",
          "args":{
             "logical_disks":[
                {
                   "size_gb":"MAX",
                   "raid_level":"5",
                   "is_root_volume":true
                }
             ]
          }
       }
    ]
kind: ConfigMap
metadata:
  name: cm-bm0
  namespace: metal3

From baremetal-operator state transaction, I would like to introduce a new state named cleaning to trigger manual cleaning with above referenced clean steps before we jump into inspecting state. The state machine for BaremetalHost creation should look like:

Registering --> Cleaning (if CleanStep is specified) --> Inspecting --> Ready

@dhellmann : As our discussion in slack channel, there are some changes in Metal3 API. Could you help me to review my proposal? If there is anything need to be changed, please let me know about it :)

Thank you very much!

[1] https://docs.openstack.org/ironic/latest/admin/bios.html#configure-bios-settings [2] https://docs.openstack.org/ironic/latest/admin/raid.html#workflow [3] https://docs.openstack.org/ironic/pike/admin/cleaning.html#manual-cleaning

dhellmann commented 5 years ago

@longkb I'm sorry for taking so long to get to this -- we have a big internal deadline coming up and I'm out of the office next week so I've been pretty busy.

My initial reaction is that I do want to ensure that all of the instructions go into the CRD, rather than using an external ConfigMap. Using fields in the CRD ensures that we fully describe the API, don't have surprise behaviors (for example, if the settings are in a ConfigMap, do we need to update the host when that changes?), and that we can take advantage of kubernetes API type and value validation features.

I also want to be careful that we don't directly expose ironic concepts like "steps" or deploy templates, even if we end up using those under the hood. Ironic is an implementation detail, and should not dictate the API. The host CRD should describe the desired end state, and then the controller code should work out the changes needed to reconcile the host's configuration to match.

In your example, you have a BIOS setting change. We might describe that by adding a "bios" section to the spec portion of the host, and giving it a field with a name like hyperThreading that is a pointer to a boolean (assuming that option only has 2 values, using a pointer allows us to easily tell the difference between a value that is missing and a value that is explicitly set to false). So that would give us something like this:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm0
spec:
  online: true
  bmc:
    address: libvirt://192.168.122.1:6230/
    credentialsName: bm0-bmc-secret
  bootMACAddress: 52:54:00:b7:b2:6f
  bios:
    hyperThreading: false

As we support changing other BIOS settings, we would add explicit fields for them and provide validation for the inputs.

For the RAID configuration we want to describe the end state in a similarly declarative way. I'm not sure off the top of my head what that would look like, but I hope you get the idea of the general direction I think we want to go.

longkb commented 5 years ago

@dhellmann Thanks you for the YAML sample. I would like to propose the new YAML file for both BIOS and RAID as your suggestion.

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm0
spec:
  online: true
  bmc:
    address: libvirt://192.168.122.1:6230/
    credentialsName: bm0-bmc-secret
  bootMACAddress: 52:54:00:b7:b2:6f
  bios:
    action: apply_configuration
    settings:
      - name: hyper_threading_enabled
        value: "false"
  raid:
    action: create_configuration
    logical_disks:
      - size_gb: 100
        raid_level: "1"
        is_root_volume: true
      - size_gb: 50
        raid_level: "0+1"

In the above BaremetalHost YAML:

The above mentioned action might make sense due to I am using Ironic concepts as you said, so should I modified these one?

Please help me to review my YAML sample.

Thank you very much!

[1] https://docs.openstack.org/ironic/rocky/admin/bios.html#configure-bios-settings [2] https://docs.openstack.org/ironic/latest/admin/raid.html#workflow

dtantsur commented 5 years ago

I'd remove action bits, they're internal details of ironic.

  bios:
      - name: hyper_threading_enabled
        value: "false"
  raid:
      - size_gb: 100
        raid_level: "1"
        is_root_volume: true
      - size_gb: 50
        raid_level: "0+1"
longkb commented 5 years ago

@dtantsur: Thank you for your reviewing. By removing action, you mean apply_configuration and create_configuration should be executed by default when BaremetalHost is created. And factory_reset, delete_configuration will be triggered in the event of BaremetalHost object deletion. Am I right?

dhellmann commented 5 years ago

I'd remove action bits, they're internal details of ironic.

Right, the point of metal3 is to wrap the Ironic API in one that is easier for kubernetes users to use and understand. That is going to mean doing more work to create abstractions for the features of Ironic.

  bios:
      - name: hyper_threading_enabled
        value: "false"

I would prefer to have the BIOS settings explicitly supported in the API of the host object. Otherwise, we have to do something to look at every incoming name and figure out whether it is recognized and deal with the type validation of the values. The user also has to know what parameters are accepted and will not have the benefit of API documentation to tell them.

If we build the API around a set of commonly supported parameters, then we can use simple structures like:

bios:
  - hyperThreading: false

and based on those settings the operator can pass values to Ironic that may be hardware-platform specific (either by pulling information from the hardware profile or from the existing access drivers we have already for working with different BMCs).

  raid:
      - size_gb: 100
        raid_level: "1"
        is_root_volume: true
      - size_gb: 50
        raid_level: "0+1"

When I looked at the RAID setup documentation for ironic I noticed that only one volume can have is_root_volume set to true (which makes sense). It would be nice if we could organize the API inputs to enforce that without having to check it ourselves. For example:

raid:
  rootVolume:
    sizeGb: 100
    level: "1"
  volumes:
    - sizeGb: 50
      level: "0+1"

where both "rootVolume" and "volumes" are optional and the value of is_root_volume is implied by using one or the other.

longkb commented 5 years ago

I'd remove action bits, they're internal details of ironic.

Right, the point of metal3 is to wrap the Ironic API in one that is easier for kubernetes users to use and understand. That is going to mean doing more work to create abstractions for the features of Ironic.

Ok. I got it. So I will remove action as your comment :)

I would prefer to have the BIOS settings explicitly supported in the API of the host object. Otherwise, we have to do something to look at every incoming name and figure out whether it is recognized and deal with the type validation of the values. The user also has to know what parameters are accepted and will not have the benefit of API documentation to tell them.

If we build the API around a set of commonly supported parameters, then we can use simple structures like:

bios:
  - hyperThreading: false

and based on those settings the operator can pass values to Ironic that may be hardware-platform specific (either by pulling information from the hardware profile or from the existing access drivers we have already for working with different BMCs).

Thank for your comment. I got an idea. I plan to define GetBIOSConfigKeyMapping() map[string]string method in AccessDetails interface [1]. This method will query from implemented driver that which BIOS settings are supported by itself. Then we could define the validator for the incoming input from the end user at this point. Finally, we could use simple YAML struct as you said:

bios:
    hyperThreadingEnabled: false
    cpuActiveProcessorCores: 0

When I looked at the RAID setup documentation for ironic I noticed that only one volume can have is_root_volume set to true (which makes sense). It would be nice if we could organize the API inputs to enforce that without having to check it ourselves. For example:

raid:
  rootVolume:
    sizeGb: 100
    level: "1"
  volumes:
    - sizeGb: 50
      level: "0+1"

Thanks. I got your point 👍 I will define API inputs to get rootVolume from YAML as you said :)

After all, I would like to write down the final YAML template for BaremetalHost. Please help me to review it :)


---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bm0
spec:
  online: true
  bmc:
    address: irmc://192.168.122.1:6230/
    credentialsName: bm0-bmc-secret
  bootMACAddress: 52:54:00:b7:b2:6f
  bios:
    hyperThreadingEnabled: false
  raid:
    rootVolume:
      name: true
      sizeGB: 100
      raidLevel: "1"
      physicalDisks:
        - "disk1"
        - "disk2"
        - "disk3"
    volumes:
      - sizeGB: 100
        raidLevel: "1"
      - sizeGB: 50
        raidLevel: "0+1"

[1] https://github.com/metal3-io/baremetal-operator/blob/master/pkg/bmc/access.go#L15

metal3-io-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

metal3-io-bot commented 4 years ago

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

metal3-io-bot commented 4 years ago

@metal3-io-bot: Closing this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/206#issuecomment-608705069): >Stale issues close after 30d of inactivity. Reopen the issue with `/reopen`. Mark the issue as fresh with `/remove-lifecycle stale`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
furukawa3 commented 4 years ago

/reopen

metal3-io-bot commented 4 years ago

@furukawa3: Reopened this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/206#issuecomment-609559662): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
furukawa3 commented 4 years ago

We're in progress this issue. #292 .

metal3-io-bot commented 4 years ago

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

metal3-io-bot commented 4 years ago

@metal3-io-bot: Closing this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/206#issuecomment-624438034): >Stale issues close after 30d of inactivity. Reopen the issue with `/reopen`. Mark the issue as fresh with `/remove-lifecycle stale`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
zaneb commented 4 years ago

/reopen /remove-lifecycle stale

metal3-io-bot commented 4 years ago

@zaneb: Reopened this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/206#issuecomment-624837627): >/reopen >/remove-lifecycle stale Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
metal3-io-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

zaneb commented 4 years ago

Still in progress (and close to merging) in #292. /remove-lifecycle stale

metal3-io-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

zaneb commented 3 years ago

Stop judging us, @metal3-io-bot. /remove-lifecycle stale

dhellmann commented 3 years ago

/lifecycle frozen

demonCoder95 commented 2 years ago

@zaneb @andfasano I believe all the work related to this issue has been merged. We can close it now.

1062 was the last item related to this.

zaneb commented 2 years ago

FYI, anyone can close an issue. Like this: /close

metal3-io-bot commented 2 years ago

@zaneb: Closing this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/206#issuecomment-1013272709): >FYI, anyone can close an issue. Like this: >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.