siderolabs / sidero

Sidero Metal is a bare metal provisioning system with support for Kubernetes Cluster API.
https://www.sidero.dev
Mozilla Public License 2.0
402 stars 63 forks source link

Feature Request: Support Intel AMT as alternative to IPMI #289

Open ammmze opened 3 years ago

ammmze commented 3 years ago

There is a feature found in many intel based computers known as vPRO, which can include a feature Intel calls AMT (Active Management Technology). Intel AMT is similar to IPMI (from what I understand...TBH i've never used either one yet, just research online). But it provides the ability to do things like set the next boot source, power on/off the machine, remote KVM, etc.

Intel provides an API to using this. Documentation can be found here. Though their documentation seems to be more about how to use their SDK rather than the actual protocol.

The zip with their SDK appears to be the CS source code, so we could use that for reference on putting together a client, or I have found someone has put together some form of an integration in GO that maybe we could use for reference.

It appears to be http based, so hopefully not too difficult to interface with.

ammmze commented 3 years ago

Also FWIW, looks like there is a CLI tool for it, but we'll probably have a better experience if we can use the api itself.

ammmze commented 2 years ago

FYI ... i'm looking at implementing this. So far i've just been experimenting with wsman (some "standardized" SOAP implementation for managing devices). I've been successful powering on/off/cycle my test machine, which really just leaves changing the next boot device to PXE. Hopefully will be able to get something wrapped up during my PTO time during the holidays for the next couple weeks.

https://github.com/talos-systems/sidero/compare/master...ammmze:amt

ammmze commented 2 years ago

Update: I've technically got something working. I've been able to successfully run my fork of sidero and have it control my intel based machine. I'm still learning and getting a deeper understanding of the underlying management system. There's actually more standards to this than I initially though. I hadn't realized that Intel's AMT is really stuff above and beyond DASH (which I had seen reference to from the AMD side of things). I was doing some testing with a "dashcli" and it appears to work. It's all still technically using the wsman stuff, though the default port for AMT appears to be different from the what is typically used for DASH I guess, but no big deal changing ports. My current stuff uses some named things that are specific to Intel's, so I'd like to see if I can normalize that out and dig into the DASH standard more. So the end result should be that we'd be able to control both Intel vPro and AMD Pro systems using the DASH standard.

And another thing I'm still looking into is supporting graceful shutdown/power cycling. Currently when you request a power off or power cycle, it immediately shuts down/cycles. Presumably we want graceful when supported. However, in order to support this, we'll need to enable the kernel module for Intel MEI (not sure if there is an equivalent for AMD). This gets the MEI driver installed, which by itself does NOT enable the graceful power options. Once the MEI driver is setup, then we need the Intel LMS (Local Manageability Service) (or see if we can replicate it somehow), which communicates with the MEI device (typically /dev/mei0 once the MEI driver is installed) to enable graceful options. I started doing some tests with this with a ubuntu live disk and was able to get the graceful options to get enabled so that they came back from the wsman api. However, when I tried to perform the graceful reboot, it didn't work...though it appeared to be AppArmor in the ubuntu version I was running. AFAIK talos isn't running app armor, so I didn't dig too much into it from there.

Nosmoht commented 1 year ago

Hi @ammmze ,

it's been a while since your last post. Are u still working on an implementation? If not, is there a repo i could use to continue your work?

I plan to use a lot of Intel NUC's with vPro so this feature is needed for me.

rgl commented 1 year ago

FWIW, https://github.com/bmc-toolbox/bmclib already has support for Intel AMT.

ammmze commented 1 year ago

Hi @ammmze ,

it's been a while since your last post. Are u still working on an implementation? If not, is there a repo i could use to continue your work?

I plan to use a lot of Intel NUC's with vPro so this feature is needed for me.

I stopped working on it a while back. I decided sidero was overkill in the home lab, so kinda of lost interest in sidero. But here's the stuff I had:

I have no clue what state I've left things in...there's probably a lot of things that are done terribly. I'm not super familiar with golang. But IIRC, I had it more/less working for Intel's AMT then had realized that AMT is built off the same underlying platform as what AMD's DASH uses and so I was starting to try to refactor things to more generically traverse the resources instead of assuming resources were named a specific way (the way intel has them named may be different from amd, but 🤷🏻‍♂️). I had picked up an older AMD 1L sized computer for cheap, but never really figured out how to get DASH set up to really make progress.

But I'd be curious to see if we could just use that bmc-toolbox that was posted above, pull that into sidero and just use that instead of the go-amt I had started.

ammmze commented 1 year ago

Ha...fancy that...I just looked at that bmclib and the amt library they are using is a fork of my original work.

timblaktu commented 1 year ago

@ammmze @rgl @Nosmoht has there been any progress or update on this? Consider me the fourth horseman to help ride this thing through. Are any of you sidero devs, or know who to ping to get some :eyes: on this?

I'm new to sidero metal and talos but am preparing to use them to set up a bare metal k8s cluster in my homelab. (At work we're using MaaS/Ubuntu/RKE2 for this exact same thing, but I feel Sidero's solutions are... better.)

I didnt even know what Intel AMT was until 30min ago when i googled it after browsing through the BIOS in one of the 3 used Dell Optiplex 7050 SFF i just bought to use for the Talos control plane + etcd. Next thing i googled was "Sidero Metal Intel AMT" and, well, here i am.

My 3 worker nodes are going to be r730xd with iDRACs, and i thought it would be rad to make the whole bare metal shebang discoverable and sidero-metal-bootstrappable.

(Aside: My dream, the icing on this cake, is to keep all configuration/state in git and s3, so i can maintain a full e2e testability, from zero to fully provisioned, blazing talos k8s bad-assery, in a single fell swoop initiated by Sidero Metal pxe booting blank machines, and ending with Sidero Metal wiping their disks and powering them off.)

So, let me know where this body of work is, whether there is any plan yet, and how i can help. In addition to being well-positioned to do some testing, I'm an experienced embedded SW dev who has shifted into DevOps and the full stack web space in the last decade.

aarnaud commented 6 months ago

I opened a discussion here about alternative if BMC can't be use:

rgl commented 6 months ago

I'm still sporadically playing with Intel AMT. Besides https://github.com/bmc-toolbox/bmclib, there is https://github.com/open-amt-cloud-toolkit/open-amt-cloud-toolkit. It provides an UI SDK and a reference implementation of backend services which expose an API to control the servers (aka AMT devices). Each AMT device is configured to do an outbound connection to a open-amt-cloud-toolkit daemon. After that connection is up, we can use that open-amt-cloud-toolkit daemon to control any connected AMT device. Having some kind of integration between Sidero and that open-amt-cloud-toolkit daemon API seems possible.

aarnaud commented 6 months ago

AMT and DASH still useful to replace BMC but not every machine have it

timblaktu commented 2 weeks ago

FWIW, I've recently discovered that Omni can be self-hosted, and for now have decided to bail on using sidero-metal and the BMC/PXE approach. Since I don't have that many machines to bootstrap, it's not an inconvenience to have to manually boot a custom install image for each node, and I get all that Omni offers on top, so I'm happy.