purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.47k stars 308 forks source link

WIP: lang: core: os: Start virtualization detection #756

Open bearrito opened 3 months ago

bearrito commented 3 months ago
  1. Interfaces are still in flux. Do to the large number of platforms, I want to minimize the amount of platform specific code.
  2. Mainly focusing on detection through file contents or network calls. Detection requiring reading bios, magic bytes or other hard to replicate approaches will be last.
  3. I'll add testing support for running servers that implement the network calls GCE and AWS metadata

It would be helpful if people could comment with file contents that contain platform information. I'll list these on this PR

Closes: https://github.com/purpleidea/mgmt/issues/737

purpleidea commented 3 months ago

Neat! Thanks for your patch. Looks like a good base. There are a few stylistic things to review, and some structural ones.

Firstly, please have a quick look at the https://github.com/purpleidea/mgmt/blob/master/docs/style-guide.md which has some recommendations. Briefly you'll want to use (obj *Whatever) as the method receiver name with a pointer as done elsewhere in the project.

Structurally, now that we know that some of these lookups are expensive, do we want to cache them? Would they ever return a different result?

Similarly if this contains expensive operations like network calls, we might need to wrap them with the more complex API that can cancel them if we're shutting down...

Lastly, is the big resources/ folder just for tests? Can we simplify it so the wall of mount info is much more compact? What with trying to avoid any semblance of obfuscated data in there please?

purpleidea commented 3 months ago

Feel free to ping if you have more questions about structuring all this...

bearrito commented 3 months ago

Structurally, now that we know that some of these lookups are expensive, do we want to cache them? Would they ever return a different result?

Sure. If there are examples in the codebase, I'll have a look at them.

Similarly if this contains expensive operations like network calls, we might need to wrap them with the more complex API that can cancel them if we're shutting down...

Sure, is that done anywhere else in the code base, namely passing in a context? Only cloud based virtualization is going to need this. I don't anticipate supporting anything more than GET. Since for most cloud providers this http endpoint must exist and be highly-available it should never (lol) timeout or be unavailable, as your instance will basically be broken.

Lastly, is the big resources/ folder just for tests? Can we simplify it so the wall of mount info is much more compact? What with trying to avoid any semblance of obfuscated data in there please?

Correct. I'll trim them down, but want to leave some of the extraneous lines. Right now, I'm only checking if certain strings exists, but I anticipate I might need to parse some lines so I'd like some negative examples as well.

purpleidea commented 3 months ago

Sure, is that done anywhere else in the code base, namely passing in a context?

Indeed, one example is: https://github.com/purpleidea/mgmt/blob/master/lang/core/os/readfile_func.go#L112 for example.

It's more complicated because you have to handle all the events yourself...

Is it completely necessary to make a network call or is there an alternate way to do that?

For caching, I think this is as simple as a mutex and a stored global var to read and write from it.

purpleidea commented 3 months ago

One last thing: Let's briefly discuss what API we want... This does more than is_virtual... Do we want this all in one function? In two separate?

bearrito commented 3 months ago

Sure let me know what you'd like to see. I'm mainly interested in learning how to detect these on all these platforms, so I'm completely open, I can take any approach on api. I was going to try to include as much relevant information as possible in the result.

I think a network call is ultimately a nice fallback - https://cloud.google.com/compute/docs/instances/detect-compute-engine On AWS at least I know that certain custom images won't have dmi codes that indicate they are AWS. In that case though, if you can reach the AWS Metadata server then that's pretty good evidence. If we don't want network calls made though I can rip it out.

purpleidea commented 3 months ago

If we don't want network calls made though I can rip it out.

The 1st priority is correct data. The 2nd is good API The 3rd is speed.

So if we need network, then we should definitely do it (and cache it of course) but if there's an alternate way, then that's preferred only if it's as reliable/etc...

The digging into all of this is the really fun engineering stuff, I hope you're enjoying it!

bearrito commented 3 months ago

@purpleidea What privileges or capabilities will this function execute with?

purpleidea commented 3 months ago

What privileges or capabilities will this function execute with?

Whatever the user runs mgmt as. Classically root, but it turns out you can get pretty far with Linux capabilities and policykit. It's perfectly reasonable to assume you need root, if there's no other safer way to do it.

purpleidea commented 2 months ago

FYI: I have merged a change in the simple function API that passes in a context so that it's now safe to write code that might need to be cancelled. This way you don't need to use the fancier, more complex API.

bearrito commented 2 months ago

@purpleidea Sounds good. I'll probably have a draft PR ready over this weekend.