google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.92k stars 2.31k forks source link

collect and report total number of open file descriptors in a container #1994

Open sashankreddya opened 6 years ago

sashankreddya commented 6 years ago

This can be achieved through below steps 1) Get all the process ids' listed in cgroups.procs for each container 2) For each id in the above list get the FD's count from "/proc/pid/fd" and sum them across iterations. 3) Store this in ContainerStats.

dashpole commented 6 years ago

I think this would be quite useful, as many users need to diagnose what the offending process is when they run out of open file descriptors. We can figure out where in the API it belongs during the review of the implementation.

tshirtman commented 2 years ago

I don't think this is enough, the output is a lot different from the lsof output (much lower, even excluding socket connections), i think at the very least it should look into the /proc/<pid>/task/<taskid>/fd directories as well, or it's missing file descriptors opened by threads. I'm no unix expert though, and the lsof sourcecode is a lot more complex than the code that was included in the linked PR, as far as i can tell, instead of using directory listing, it directly links against kernel interfaces, so i'm not sure how to compare both results, but i think the current code in there is incomplete.

thekuffs commented 2 years ago

Based on the information I'm reading in lsof(8) and the data I've collected on systems, I don't think container_sockets is returning the correct result. It's marked as a GAUGE but the data makes it look like a COUNTER.

The lsof manpage mentions a few more states that would clarify the metric

-iTCP -sTCP:LISTEN Or, for example, to list network files with all UDP states except Idle, use: -iUDP -sUDP:Idle State names vary with UNIX dialects, so it's not possible to provide a complete list. Some common TCP state names are: CLOSED, IDLE, BOUND, LISTEN, ESTABLISHED, SYN_SENT, SYN_RCDV, ESTABLISHED, CLOSE_WAIT, FIN_WAIT1, CLOSING, LAST_ACK, FIN_WAIT_2, and TIME_WAIT. Two common UDP state names are Unbound and Idle.

This makes me wonder if there's additional information, or another place to look for the actual lifecycle state of the socket. This could be incorporated into the current implementation at https://github.com/google/cadvisor/blob/master/container/libcontainer/handler.go#L281

thekuffs commented 2 years ago

Coming back to say that I was wrong. I am observing that container_sockets is both rising and falling. So it is not strictly a COUNTER. I need to do additional research to validate the numbers it is returning.