Closed oarribas closed 1 year ago
[oc and kubectl] describe
is a veneer that more or less pretty prints data from other objects (or api calls).
https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L201 https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L3581-L3735
Thus adding oc describe node[s]
to must-gather isn't extending value (all it does is duplicate the data we 'save'; which increases archive sizes).
/close
@sferich888: Closing this issue.
[oc and kubectl] describe
is a veneer that more or less pretty prints data from other objects (or api calls).https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L201 https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L3581-L3735
Thus adding
oc describe node[s]
to must-gather isn't extending value (all it does is duplicate the data we 'save'; which increases archive sizes).
@sferich888 This is not true, adding the output of oc describe node <nodename>
will add value as we will be able to check if some issues the customer is experiencing is related to the node being overcommitted:
$ oc describe $(oc get node -o name | head -1) | tail -11
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 735m (49%) 0 (0%)
memory 3094Mi (46%) 2016Mi (30%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>
We cannot get the same information from the must-gather (it'll require to have requests and limits from all the pods in the cluster, but these info are not collected from the must-gather).
/reopen
@gmeghnag: You can't reopen an issue/PR unless you authored it or you are a collaborator.
/reopen
@oarribas: Reopened this issue.
@sferich888 , a must-gather doesn't include the information of customer's pods as already said by @gmeghnag, only from "core" namespaces. With the output of oc describe nodes
we can have useful information about the status of the nodes.
@oarribas / @gmeghnag what I am saying is that if you read files, in/from a must-gather; under: cluster-scoped-resources/core/nodes/*
; you can/should be able to parse those files, you will be able to get at the node 'status'
status:
...
allocatable:
cpu: 9500m
ephemeral-storage: "203489941959"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 31669688Ki
pods: "250"
capacity:
cpu: "10"
ephemeral-storage: 209179628Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32923064Ki
pods: "250"
These values make up exactly what you are trying to represent (and or what oc describe
presents).
xref: https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L4171-L4231
/close
@sferich888: Closing this issue.
@sferich888 they are not the same values, the values you mentioned are the allocatable cpu and memory by the node; The following, instead are the allocated, basically the sum of the requests cpu/memory and limits cpu/memory for all the pods running on the node [1]:
$ oc describe $(oc get node -o name | head -1) | tail -11
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 845m (56%) 500m (33%)
memory 3244Mi (48%) 2516Mi (37%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>
As an example as you can see the values for the memory are different:
$ oc describe $(oc get node -o name | head -1) | egrep "Capacity|memory:|Allocatable|Allocated|Resource| memory"
Capacity:
memory: 7935220Ki
Allocatable:
memory: 6784244Ki
Allocated resources:
Resource Requests Limits
memory 3244Mi (48%) 2516Mi (37%)
I hope the difference is now clear.
[1] https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go#L4233-L4255
One typical policy/habit/call-it-as-you-wish of must-gather is to prefer machine-parsable over human-readable information (as it is more compact and can be made human-readable later) and avoid redundancy. At least, this is what I was told in the past (or I understood it that way).
So the right way to go here, if that information is desired, would be to get all the pods on the cluster and then have omc
or insights or whatever parse the information in the way oc describe
does.
The only problem I'd see with this approach, however, is that oc adm inspect
should not be used for this, because that would gather all the pod logs and that would be horrible.
Another problem would be that we might be gathering too much information. Maybe must-gather can get the pod info, parse it to caculate the numbers and attach it.
But, it should still output some compact JSON or YAML file in a defined format, not a human-readable describe output subject to change between versions that may be a hell to parse by whoever wants to use that information.
Opened #339 to collect the same information in a different way than oc describe nodes
.
The
oc describe nodes
command provides information about the requests/limits, pods in the nodes, allocated resources and events related to each node. Having that information in the must-gather report could help a lot with troubleshooting. If possible, the description of each node in separated files with the name of the node.