sosreport / sos

A unified tool for collecting system logs and other debug information
http://sos.rtfd.org
GNU General Public License v2.0
508 stars 543 forks source link

Add Ansible Automation Platform as a cluster option in sos collect #3344

Open Sayalijoshi25 opened 1 year ago

Sayalijoshi25 commented 1 year ago

sos collect -l

TurboTurtle commented 1 year ago

Definitely a valid use case for a cluster profile. What we'd need to know is a standard way to identify the hosts in the cluster. Is there a command that can be run, or an easily accessible/scrape-able API we can hit to get the host list?

pmoravec commented 1 year ago

Definitely a valid use case for a cluster profile. What we'd need to know is a standard way to identify the hosts in the cluster. Is there a command that can be run, or an easily accessible/scrape-able API we can hit to get the host list?

I got some hints from AAP colleagues here: For identifying if a controller instance is a cluster or not, the command awx-manage list_instances may be used:

# awx-manage list_instances
[controlplane capacity=112 policy=100%]
    [vm-10-1-230-102.obfusscateddomain0.com](http://vm-10-1-230-102.obfusscateddomain0.com/) capacity=56 node_type=control version=4.4.1 heartbeat="2023-08-29 09:04:23"
    [vm-10-1-230-103.obfusscateddomain0.com](http://vm-10-1-230-103.obfusscateddomain0.com/) capacity=56 node_type=control version=4.4.1 heartbeat="2023-08-29 09:05:13"

[default capacity=114 policy=100%]
    [vm-10-1-230-100.obfusscateddomain0.com](http://vm-10-1-230-100.obfusscateddomain0.com/) capacity=57 node_type=execution version=ansible-runner-2.3.2 heartbeat="2023-08-29 09:04:52"
    [vm-10-1-230-97.obfusscateddomain0.com](http://vm-10-1-230-97.obfusscateddomain0.com/) capacity=57 node_type=execution version=ansible-runner-2.3.2 heartbeat="2023-08-29 09:04:36"

[workers capacity=114]
    [vm-10-1-230-100.obfusscateddomain0.com](http://vm-10-1-230-100.obfusscateddomain0.com/) capacity=57 node_type=execution version=ansible-runner-2.3.2 heartbeat="2023-08-29 09:04:52"
    [vm-10-1-230-97.obfusscateddomain0.com](http://vm-10-1-230-97.obfusscateddomain0.com/) capacity=57 node_type=execution version=ansible-runner-2.3.2 heartbeat="2023-08-29 09:04:36"

[ungrouped capacity=0]
    [vm-10-1-230-213.obfusscateddomain0.com](http://vm-10-1-230-213.obfusscateddomain0.com/) node_type=hop heartbeat="2023-08-29 09:04:48"

I guess if the output is not empty and contains the hostname's FQDN, we are on AAP cluster. And we should parse the output for the cluster nodes list.

TurboTurtle commented 1 year ago

I don't have anything available to me personally to poke around with this. Does awx-manage provide a way to output as json/yaml? We can slice and dice the raw text output if needed, but it'd be a lot easier if it could come pre-packaged in a more consumable form.

Is there a desired default we want to limit the list of hosts to by grouping? I.E. should we just use controlplane, or should workers or default be the...default?

cidrbl0ck commented 1 month ago

@TurboTurtle IDK if you've been ghosted here but I'm interested in the same thing.

I don't see a builtin function to awx-manage to affect the returned data. The data can certainly be pulled from the AAP api via:/api/v2/instance_groups/controlplane/instances filtering for hostname which returns a FQDN list. But grabbing SOS reports from any instance_group or instances, will be handy as well.

/api/v2/instances/ or /api/v2/instance_groups/default/instances again filtering for hostname.

TurboTurtle commented 1 month ago

@cidrbl0ck I wouldn't say ghosted, but I no longer work for RH and so I don't have the easy ability to stand up a test AAP installation to test against for building this kind of cluster profile. If I had that, I could probably get this knocked out in a couple days.

I may be able to whip up a best-effort handling of awx-manage output if I can find some free time to dedicate to it, but anything leveraging the API I'd really need an installation I could directly query and test against over said couple days.

cidrbl0ck commented 1 month ago

@TurboTurtle Totally fair. Well I've attached 3 json files for you that might help, raw dumps of:

https://..com/api/v2/instances https://..com/api/v2/instance_groups/controlplane https://..com/api/v2/instance_groups/contrlplane/instances

I'll help out any way I can.

instances.json controlplane.json controlplane_instances.json