tinkerbell / roadmap

Official Tinkerbell Roadmap
Apache License 2.0
7 stars 3 forks source link

Hardware Monitoring and Alerting #26

Open shan100github opened 1 year ago

shan100github commented 1 year ago

Instrument monitoring and alerting of hardware managed by Tinkerbell.

Redfish may provide APIs to achieve the behavior.

Redfish being created as DMTF’s Redfish® is a standard designed to deliver simple and secure management for converged, hybrid IT and the Software Defined Data Center (SDDC). Both human readable and machine capable, Redfish leverages common Internet and web services standards to expose information directly to the modern tool chain.

https://www.dmtf.org/standards/redfish

chrisdoherty4 commented 1 year ago

What kinds of operations are you thinking we'd need to utilize; are they additional to what Tinkerbell already does (power and boot device manipulation)?

Tinkerbell does already use Redfish for its BMC operations. The components that integrate with baseboard management controllers leverage a library called bmclib. It provides Redfish integration for us and we contribute where we can to keep it up-to-date.

shan100github commented 1 year ago

yes in addition to current (power and boot device manipulation), if possible it would be good to use it to perform monitoring, alerts, and notifications of underlying hardware.

I am not sure does it comes under tinkerbell's goal.

chrisdoherty4 commented 1 year ago

monitoring, alerts, and notifications of underlying hardware.

What properties do you envisage monitoring and alerting on? Is there a Redfish API that lets you 'watch' hardware for events?

I am not sure does it comes under Tinkerbell's goal.

A monitoring capability would certainly be a scope expansion to Tinkerbell. It wouldn't fit within the current goals of the project per-se, but goals can be expanded.


I support this sitting in the roadmap repository as an 'on ice' issue meaning we hold off adding it as a project item due to the scope increase it would bring.

I think its also important to manage expectations and I suspect the current maintainers won't be able to prioritize it for some time due to the backlog of existing work. Additional contributors are welcome to champion further.

shan100github commented 1 year ago

It will be helpful to monitor the hardware status of Hardware like LAN, RAM etc, along with the support to provision OS and the environment through Tinkerbell.

Some details about monitoring and events can be referred from https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events.pdf