Open sashankreddya opened 6 years ago
I think this would be quite useful, as many users need to diagnose what the offending process is when they run out of open file descriptors. We can figure out where in the API it belongs during the review of the implementation.
I don't think this is enough, the output is a lot different from the lsof
output (much lower, even excluding socket connections), i think at the very least it should look into the /proc/<pid>/task/<taskid>/fd
directories as well, or it's missing file descriptors opened by threads. I'm no unix expert though, and the lsof
sourcecode is a lot more complex than the code that was included in the linked PR, as far as i can tell, instead of using directory listing, it directly links against kernel interfaces, so i'm not sure how to compare both results, but i think the current code in there is incomplete.
Based on the information I'm reading in lsof(8)
and the data I've collected on systems, I don't think container_sockets
is returning the correct result. It's marked as a GAUGE
but the data makes it look like a COUNTER
.
The lsof manpage mentions a few more states that would clarify the metric
-iTCP -sTCP:LISTEN Or, for example, to list network files with all UDP states except Idle, use: -iUDP -sUDP:Idle State names vary with UNIX dialects, so it's not possible to provide a complete list. Some common TCP state names are: CLOSED, IDLE, BOUND, LISTEN, ESTABLISHED, SYN_SENT, SYN_RCDV, ESTABLISHED, CLOSE_WAIT, FIN_WAIT1, CLOSING, LAST_ACK, FIN_WAIT_2, and TIME_WAIT. Two common UDP state names are Unbound and Idle.
This makes me wonder if there's additional information, or another place to look for the actual lifecycle state of the socket. This could be incorporated into the current implementation at https://github.com/google/cadvisor/blob/master/container/libcontainer/handler.go#L281
Coming back to say that I was wrong. I am observing that container_sockets
is both rising and falling. So it is not strictly a COUNTER
. I need to do additional research to validate the numbers it is returning.
This can be achieved through below steps 1) Get all the process ids' listed in cgroups.procs for each container 2) For each id in the above list get the FD's count from "/proc/pid/fd" and sum them across iterations. 3) Store this in ContainerStats.