Closed furukawa3 closed 2 years ago
I wonder if it makes sense to fold RAID setup into the hardware profile? Or would we potentially want hosts that match the same profile to have different RAID configurations?
Hi. I think we should have 1 RAID configuration with 1 bare metal server. So, I agree with your 1st proposal. Here is a my draft idea of hardware profile. Just adding "targetRAID".
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm1
spec:
online: true
bmc:
address: libvirt://192.168.122.1:6230/
credentialsName: bm1-bmc-secret
bootMACAddress: 52:54:00:c3:40:2e
targetRAID: 5
@dtantsur and @juliakreger do you have any thoughts on this?
Should this tie into the deploy templates work done for Train? We could just use a deploy template name here, that would cover RAID, BIOS and any other customization from deploy templates.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm1
spec:
online: true
bmc:
address: libvirt://192.168.122.1:6230/
credentialsName: bm1-bmc-secret
bootMACAddress: 52:54:00:c3:40:2e
deployTemplates:
- CUSTOM_RAID5
- CUSTOM_BIOS_VMX
Of course, we'd need something to create deploy templates themselves from their JSON definition.
Reference: https://docs.openstack.org/ironic/latest/admin/node-deployment.html#deploy-templates
@dhellmann @furukawa3 I guess a target + a hardware profile would work, the potential issue that comes to mind is that raid configuration traditionally takes some details to achieve. See https://docs.openstack.org/ironic/latest/admin/drivers/irmc.html#raid-configuration-support
I'm so sorry I was late reply!!
@dtantsur @juliakreger Long time no see since Denver PTG :)
Should this tie into the deploy templates work done for Train?
Yes, I hope so but I don't know the ironic image would update to Train in metal3. I think deployTemplates is helpful to define/configure. Taking BIOS and other customization into consideration, it's reasonable to use it. However, I'm just thinking which is better to use deployTemplates or hardware profile. I just updated the yaml definition with all RAID parameter.(Thanks Julia) As Dmitry said, should we use deployTemplate to support other configuration?(BIOS, other vendor's driver...)
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm1
spec:
online: true
bmc:
address: 192.168.122.1
credentialsName: bm1-bmc-secret
bootMACAddress: 52:54:00:c3:40:2e
hardwareProfile:
logical_disks:
size_gb: MAX,
raid_level: "0",
controller: "PRAID EP420i (0)",
physical_disks:
- "0",
- "1",
- "2"
Hi folks,
@dtantsur : Thanks for your suggestion about using deployTemplates. However, I have taken an investigation about RAID and BIOS configuration[1][2], and these configuration are executed via clean step in Manual cleaning [3], not deploy step. So I think we cannot use deployTemplates for these configuration.
@juliakreger , @furukawa3 : I think target + hardware profile would make our YAML for BaremetalHost become bulky.
So I would like to propose a new property named cleanSteps for these configuration, which point to ConfigMap object that include these configuration. And our YAML file should look like:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm0
spec:
online: true
bmc:
address: libvirt://192.168.122.1:6230/
credentialsName: bm0-bmc-secret
bootMACAddress: 52:54:00:b7:b2:6f
cleanSteps: cm-bm0
The ConfigMap object should look like:
apiVersion: v1
data:
configSteps: |
[
{
"interface":"bios",
"step":"apply_configuration",
"args":{
"settings":[
{
"name":"hyper_threading_enabled",
"value":"false"
}
]
}
},
{
"interface":"raid",
"step":"create_configuration",
"args":{
"logical_disks":[
{
"size_gb":"MAX",
"raid_level":"5",
"is_root_volume":true
}
]
}
}
]
kind: ConfigMap
metadata:
name: cm-bm0
namespace: metal3
From baremetal-operator state transaction, I would like to introduce a new state named cleaning to trigger manual cleaning with above referenced clean steps before we jump into inspecting state. The state machine for BaremetalHost creation should look like:
Registering --> Cleaning (if CleanStep is specified) --> Inspecting --> Ready
@dhellmann : As our discussion in slack channel, there are some changes in Metal3 API. Could you help me to review my proposal? If there is anything need to be changed, please let me know about it :)
Thank you very much!
[1] https://docs.openstack.org/ironic/latest/admin/bios.html#configure-bios-settings [2] https://docs.openstack.org/ironic/latest/admin/raid.html#workflow [3] https://docs.openstack.org/ironic/pike/admin/cleaning.html#manual-cleaning
@longkb I'm sorry for taking so long to get to this -- we have a big internal deadline coming up and I'm out of the office next week so I've been pretty busy.
My initial reaction is that I do want to ensure that all of the instructions go into the CRD, rather than using an external ConfigMap. Using fields in the CRD ensures that we fully describe the API, don't have surprise behaviors (for example, if the settings are in a ConfigMap, do we need to update the host when that changes?), and that we can take advantage of kubernetes API type and value validation features.
I also want to be careful that we don't directly expose ironic concepts like "steps" or deploy templates, even if we end up using those under the hood. Ironic is an implementation detail, and should not dictate the API. The host CRD should describe the desired end state, and then the controller code should work out the changes needed to reconcile the host's configuration to match.
In your example, you have a BIOS setting change. We might describe that by adding a "bios" section to the spec portion of the host, and giving it a field with a name like hyperThreading that is a pointer to a boolean (assuming that option only has 2 values, using a pointer allows us to easily tell the difference between a value that is missing and a value that is explicitly set to false). So that would give us something like this:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm0
spec:
online: true
bmc:
address: libvirt://192.168.122.1:6230/
credentialsName: bm0-bmc-secret
bootMACAddress: 52:54:00:b7:b2:6f
bios:
hyperThreading: false
As we support changing other BIOS settings, we would add explicit fields for them and provide validation for the inputs.
For the RAID configuration we want to describe the end state in a similarly declarative way. I'm not sure off the top of my head what that would look like, but I hope you get the idea of the general direction I think we want to go.
@dhellmann Thanks you for the YAML sample. I would like to propose the new YAML file for both BIOS and RAID as your suggestion.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm0
spec:
online: true
bmc:
address: libvirt://192.168.122.1:6230/
credentialsName: bm0-bmc-secret
bootMACAddress: 52:54:00:b7:b2:6f
bios:
action: apply_configuration
settings:
- name: hyper_threading_enabled
value: "false"
raid:
action: create_configuration
logical_disks:
- size_gb: 100
raid_level: "1"
is_root_volume: true
- size_gb: 50
raid_level: "0+1"
In the above BaremetalHost YAML:
The above mentioned action might make sense due to I am using Ironic concepts as you said, so should I modified these one?
Please help me to review my YAML sample.
Thank you very much!
[1] https://docs.openstack.org/ironic/rocky/admin/bios.html#configure-bios-settings [2] https://docs.openstack.org/ironic/latest/admin/raid.html#workflow
I'd remove action
bits, they're internal details of ironic.
bios:
- name: hyper_threading_enabled
value: "false"
raid:
- size_gb: 100
raid_level: "1"
is_root_volume: true
- size_gb: 50
raid_level: "0+1"
@dtantsur: Thank you for your reviewing.
By removing action
, you mean apply_configuration
and create_configuration
should be executed by default when BaremetalHost is created. And factory_reset
, delete_configuration
will be triggered in the event of BaremetalHost object deletion. Am I right?
I'd remove
action
bits, they're internal details of ironic.
Right, the point of metal3 is to wrap the Ironic API in one that is easier for kubernetes users to use and understand. That is going to mean doing more work to create abstractions for the features of Ironic.
bios: - name: hyper_threading_enabled value: "false"
I would prefer to have the BIOS settings explicitly supported in the API of the host object. Otherwise, we have to do something to look at every incoming name and figure out whether it is recognized and deal with the type validation of the values. The user also has to know what parameters are accepted and will not have the benefit of API documentation to tell them.
If we build the API around a set of commonly supported parameters, then we can use simple structures like:
bios:
- hyperThreading: false
and based on those settings the operator can pass values to Ironic that may be hardware-platform specific (either by pulling information from the hardware profile or from the existing access drivers we have already for working with different BMCs).
raid: - size_gb: 100 raid_level: "1" is_root_volume: true - size_gb: 50 raid_level: "0+1"
When I looked at the RAID setup documentation for ironic I noticed that only one volume can have is_root_volume set to true (which makes sense). It would be nice if we could organize the API inputs to enforce that without having to check it ourselves. For example:
raid:
rootVolume:
sizeGb: 100
level: "1"
volumes:
- sizeGb: 50
level: "0+1"
where both "rootVolume" and "volumes" are optional and the value of is_root_volume is implied by using one or the other.
I'd remove
action
bits, they're internal details of ironic.Right, the point of metal3 is to wrap the Ironic API in one that is easier for kubernetes users to use and understand. That is going to mean doing more work to create abstractions for the features of Ironic.
Ok. I got it. So I will remove action as your comment :)
I would prefer to have the BIOS settings explicitly supported in the API of the host object. Otherwise, we have to do something to look at every incoming name and figure out whether it is recognized and deal with the type validation of the values. The user also has to know what parameters are accepted and will not have the benefit of API documentation to tell them.
If we build the API around a set of commonly supported parameters, then we can use simple structures like:
bios: - hyperThreading: false
and based on those settings the operator can pass values to Ironic that may be hardware-platform specific (either by pulling information from the hardware profile or from the existing access drivers we have already for working with different BMCs).
Thank for your comment. I got an idea. I plan to define GetBIOSConfigKeyMapping() map[string]string
method in AccessDetails
interface [1]. This method will query from implemented driver that which BIOS settings are supported by itself. Then we could define the validator for the incoming input from the end user at this point. Finally, we could use simple YAML struct as you said:
bios:
hyperThreadingEnabled: false
cpuActiveProcessorCores: 0
When I looked at the RAID setup documentation for ironic I noticed that only one volume can have is_root_volume set to true (which makes sense). It would be nice if we could organize the API inputs to enforce that without having to check it ourselves. For example:
raid: rootVolume: sizeGb: 100 level: "1" volumes: - sizeGb: 50 level: "0+1"
Thanks. I got your point 👍 I will define API inputs to get rootVolume
from YAML as you said :)
After all, I would like to write down the final YAML template for BaremetalHost. Please help me to review it :)
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bm0
spec:
online: true
bmc:
address: irmc://192.168.122.1:6230/
credentialsName: bm0-bmc-secret
bootMACAddress: 52:54:00:b7:b2:6f
bios:
hyperThreadingEnabled: false
raid:
rootVolume:
name: true
sizeGB: 100
raidLevel: "1"
physicalDisks:
- "disk1"
- "disk2"
- "disk3"
volumes:
- sizeGB: 100
raidLevel: "1"
- sizeGB: 50
raidLevel: "0+1"
[1] https://github.com/metal3-io/baremetal-operator/blob/master/pkg/bmc/access.go#L15
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues will close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues close after 30d of inactivity. Reopen the issue with /reopen
. Mark the issue as fresh with /remove-lifecycle stale
.
/close
@metal3-io-bot: Closing this issue.
/reopen
@furukawa3: Reopened this issue.
We're in progress this issue. #292 .
Stale issues close after 30d of inactivity. Reopen the issue with /reopen
. Mark the issue as fresh with /remove-lifecycle stale
.
/close
@metal3-io-bot: Closing this issue.
/reopen /remove-lifecycle stale
@zaneb: Reopened this issue.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues will close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Still in progress (and close to merging) in #292. /remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues will close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stop judging us, @metal3-io-bot. /remove-lifecycle stale
/lifecycle frozen
@zaneb @andfasano I believe all the work related to this issue has been merged. We can close it now.
FYI, anyone can close an issue. Like this: /close
@zaneb: Closing this issue.
We'd like to deploy baremetal server with RAID configuration. In order to setup/unset RAID, using vendor driver is necessary. This issue proposes a new yaml attribute to setup RAID with Fujitsu PRIMERGY server by using iRMC driver.