Open irishgordo opened 9 months ago
Also as a workaround did try - disabling the out of band at the host - disabling the harvester-seeder addon. Then re-enabling harvester-seeder addon, re-enabling the out-of-band access at the host (re-leveraging the already created secret) but still hitting the issue of:
failed to open connection to BMC: 5 errors occurred:
* provider: gofish: Get "https://192.168.9.118/redfish/v1/": x509: certificate has expired or is not yet valid: current time 2023-10-18T19:52:58Z is after 2021-06-04T23:54:19Z
* provider: ipmitool: Error: Unable to establish IPMI v2 / RMCP+ session: exit status 1
* provider: *asrockrack.ASRockRack: Error logging in: Post "https://192.168.9.118/api/session": x509: certificate has expired or is not yet valid: current time 2023-10-18T19:53:51Z is after 2021-06-04T23:54:19Z
* provider: IntelAMT: Unable to perform digest auth with http://192.168.9.118:443/wsman: Post "http://192.168.9.118:443/wsman": EOF
* no Opener implementations found
So to offer more context on this. :thread:
With Harvester v1.2.2, it is noticable that IPMI based Alerts/Events can come accross on a Dell PowerEdge R720 -w/ the settings configured correctly in iDRAC 7.
Though... to be noted.
RedFish fails entirely, even if the service is up when selecting "insecure TLS" box on the front-end. As RedFish is using port 443, by default with iDRAC... granted, at least with iDRAC 7, and from what I can tell, though mileage might vary on that...
What seems to happen is indeed the same thing.
That there is an x509 error with the gofish
library.
Where it's apparent from like a GET to https://192.168.11.118/redfish/v1 - we do yield back data, x-ref:
{
"@odata.context": "/redfish/v1/$metadata#ServiceRoot.ServiceRoot",
"@odata.id": "/redfish/v1",
"@odata.type": "#ServiceRoot.v1_3_0.ServiceRoot",
"AccountService": {
"@odata.id": "/redfish/v1/Managers/iDRAC.Embedded.1/AccountService"
},
"Chassis": {
"@odata.id": "/redfish/v1/Chassis"
},
"Description": "Root Service",
"EventService": {
"@odata.id": "/redfish/v1/EventService"
},
"Fabrics": {
"@odata.id": "/redfish/v1/Fabrics"
},
"Id": "RootService",
"JsonSchemas": {
"@odata.id": "/redfish/v1/JSONSchemas"
},
"Links": {
"Sessions": {
"@odata.id": "/redfish/v1/Sessions"
}
},
"Managers": {
"@odata.id": "/redfish/v1/Managers"
},
"Name": "Root Service",
"Oem": {
"Dell": {
"@odata.type": "#DellServiceRoot.v1_0_0.ServiceRootSummary",
"IsBranded": 1,
"ManagerMACAddress": "C8:1F:66:B7:B2:12",
"ServiceTag": "6PGZDZ1"
}
},
"Product": "Integrated Remote Access Controller",
"ProtocolFeaturesSupported": {
"ExpandQuery": {
"ExpandAll": true,
"Levels": true,
"Links": true,
"MaxLevels": 1,
"NoLinks": true
},
"FilterQuery": true,
"SelectQuery": true
},
"RedfishVersion": "1.4.0",
"Registries": {
"@odata.id": "/redfish/v1/Registries"
},
"SessionService": {
"@odata.id": "/redfish/v1/SessionService"
},
"Systems": {
"@odata.id": "/redfish/v1/Systems"
},
"Tasks": {
"@odata.id": "/redfish/v1/TaskService"
},
"UpdateService": {
"@odata.id": "/redfish/v1/UpdateService"
}
}
Additionally, checking our event service with a GET out to https://192.168.11.118/redfish/v1/EventService yields:
{
"@odata.context": "/redfish/v1/$metadata#EventService.EventService",
"@odata.id": "/redfish/v1/EventService",
"@odata.type": "#EventService.v1_0_6.EventService",
"Actions": {
"#EventService.SubmitTestEvent": {
"EventType@Redfish.AllowableValues": [
"StatusChange",
"ResourceUpdated",
"ResourceAdded",
"ResourceRemoved",
"Alert"
],
"target": "/redfish/v1/EventService/Actions/EventService.SubmitTestEvent"
}
},
"DeliveryRetryAttempts": 5,
"DeliveryRetryIntervalSeconds": 30,
"Description": "Event Service represents the properties for the service",
"EventTypesForSubscription": [
"StatusChange",
"ResourceUpdated",
"ResourceAdded",
"ResourceRemoved",
"Alert"
],
"EventTypesForSubscription@odata.count": 5,
"Id": "EventService",
"Name": "Event Service",
"ServiceEnabled": true,
"Status": {
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"Subscriptions": {
"@odata.id": "/redfish/v1/EventService/Subscriptions"
}
}
But!
To note, IPMI, over port 623 with Dell iDRAC does not pose any issues - but the only issues seem to be present in the gofish
library, as it's not respecting the "ignore certs" / "insecure TLS" checkbox on the front-end, as if it was, it wouldn't return back x509 cert issues as yes, in this case the cert is self-signed.
The self-signed cert is even noticable in logs on racadm
:
╭─mike at suse-workstation-team-harvester in ~/Projects/moritz-baremetal
╰─○ sudo racadm -r 192.168.11.118 -u root -p root get iDRAC.RedfishEventing.IgnoreCertificateErrors
Security Alert: Certificate is invalid - self-signed certificate
Continuing execution. Use -S option for racadm to stop execution on certificate-related errors.
[Key=iDRAC.Embedded.1#RedfishEventing.1]
IgnoreCertificateErrors=Yes
╭─mike at suse-workstation-team-harvester in ~/Projects/moritz-baremetal
╰─○ sudo racadm -r 192.168.11.118 -u root -p root get iDRAC.RedfishEventing
Security Alert: Certificate is invalid - self-signed certificate
Continuing execution. Use -S option for racadm to stop execution on certificate-related errors.
[Key=iDRAC.Embedded.1#RedfishEventing.1]
DeliveryRetryAttempts=5
DeliveryRetryIntervalInSeconds=30
#IgnoreCertificateErrors=Yes
Also to note, the Inventory, of course on 443, does complain kubectl get inventories -A
w/ Describe on the one:
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: metal.harvesterhci.io/v1alpha1
kind: Inventory
metadata:
annotations:
metal.harvesterhci.io/local-inventory: "true"
metal.harvesterhci.io/local-node-name: dell-r720-node
creationTimestamp: "2024-06-03T19:49:43Z"
finalizers:
- finalizer.inventory.metal.harvesterhci.io
generation: 6
name: dell-r720-node
namespace: harvester-system
resourceVersion: "73832"
uid: 7dd4e2d9-4a2d-4ee5-8cf5-e4f6e9eec7a1
spec:
baseboardSpec:
connection:
authSecretRef:
name: idrac
namespace: default
host: 192.168.11.118
insecureTLS: true
port: 443
events:
enabled: true
pollingInterval: 1h
managementInterfaceMacAddress: ""
primaryDisk: ""
status:
conditions:
- lastUpdateTime: "2024-06-03T20:52:19Z"
status: "True"
type: bmcObjectCreated
- lastUpdateTime: "2024-06-03T19:49:44Z"
status: "False"
type: bmcJobSubmitted
- lastUpdateTime: "2024-06-03T19:49:44Z"
status: "False"
type: bmcJobCompleted
- lastUpdateTime: "2024-06-03T19:49:50Z"
status: "True"
type: inventoryAllocatedToCluster
- lastUpdateTime: "2024-06-03T20:53:12Z"
message: "failed to open connection to BMC: 5 errors occurred:\n\t* provider:
gofish: Get \"https://192.168.11.118/redfish/v1/\": tls: failed to verify certificate:
x509: cannot validate certificate for 192.168.11.118 because it doesn't contain
any IP SANs\n\t* provider: ipmitool: Error: Unable to establish IPMI v2 / RMCP+
session: exit status 1\n\t* provider: *asrockrack.ASRockRack: Error logging
in: Post \"https://192.168.11.118/api/session\": tls: failed to verify certificate:
x509: cannot validate certificate for 192.168.11.118 because it doesn't contain
any IP SANs\n\t* provider: IntelAMT: Unable to perform digest auth with http://192.168.11.118:443/wsman:
Post \"http://192.168.11.118:443/wsman\": EOF\n\t* no Opener implementations
found\n\n"
reason: Error
status: "True"
type: machineNotContactable
hardwareID: 2b37edae-21eb-11ef-9f92-ead8a4811903
ownerCluster:
name: ""
namespace: ""
powerAction: {}
pxeBootConfig:
address: 192.168.104.169
Wondering if maybe something with gofish bmcclientlib is an issue: https://github.com/harvester/seeder/blob/6f07c186f9fe0732c2d3d921ab0a78a3dd6ce907/pkg/controllers/setup.go#L192-L210
Default transport for http client seems to ensure insecure skip verify is enabled on the bmc-client-lib: https://github.com/bmc-toolbox/bmclib/blob/77eee83ecf866d895be464887ca76598a9645a87/internal/httpclient/httpclient.go#L34-L45
cc: @ibrokethecloud
Hi, did you get this working in the end, we have a dell server also and when enabling the BMS we get the same red error box you show above
Describe the bug This seems to occur on the case where the user has underlying bare-metal infrastructure that has an expired SSL cert. The user then enables
harvester-seeder
addon & configures on host with credentials. Yielding:Then the user updates the certificate / rolling the cert on the bare-metal device (node that provides Harvester). Disabling & Re-enabling on the Host the
out-of-band
access. And while the cert may not be expired anymore on the bare-metal the Harvester host still thinks it is. Adjusting polling interval, enabling or disabling theout-of-band
access on the host seems to not have any effect.To Reproduce Pre-Reqs:
out-of-band
access on the host once the cert on the bare-metal machine has been updatedExpected behavior Somehow have a "refresh" or dumping of any saved certs for the server/host if
out-of-band
access is enabled/disabled on the given host.Support bundle supportbundle_09c663e9-9569-4362-9a3e-cc3a2d703f01_2023-10-17T21-37-08Z.zip
Environment
Additional context
![Screenshot from 2023-10-17 11-40-09](https://github.com/harvester/harvester/assets/5370752/7422c9fb-ab7b-4ddf-92c4-bb86bd99edb7)