weaveworks / scope

Monitoring, visualisation & management for Docker & Kubernetes
https://www.weave.works/oss/scope/
Apache License 2.0
5.87k stars 712 forks source link

Counting open files is expensive #3078

Open bboreham opened 6 years ago

bboreham commented 6 years ago

From a cpu profile:

Duration: 30.01s, Total samples = 4.16s (13.86%)
Showing nodes accounting for 4.16s, 100% of 4.16s total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             0.80s   100% |   github.com/weaveworks/scope/probe/process.(*CachingWalker).Tick /go/src/github.com/weaveworks/scope/probe/process/walker.go
         0     0%     0%      0.80s 19.23%                | github.com/weaveworks/scope/probe/process.(*walker).Walk /go/src/github.com/weaveworks/scope/probe/process/walker_linux.go
                                             0.41s 51.25% |   github.com/weaveworks/scope/vendor/github.com/weaveworks/common/fs.ReadDirCount /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/common/fs/fs.go
                                             0.32s 40.00% |   github.com/weaveworks/scope/probe/process.readStats /go/src/github.com/weaveworks/scope/probe/process/walker_linux.go
...

----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             0.41s   100% |   github.com/weaveworks/scope/vendor/github.com/weaveworks/common/fs.(*realFS).ReadDirCount <autogenerated>
     0.02s  0.48%  0.48%      0.41s  9.86%                | github.com/weaveworks/scope/vendor/github.com/weaveworks/common/fs.realFS.ReadDirCount /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/common/fs/readdircount_linux_amd64.go
                                             0.28s 68.29% |   syscall.ReadDirent /usr/local/go/src/syscall/syscall_linux.go
                                             0.09s 21.95% |   os.Open /usr/local/go/src/os/file.go
                                             0.01s  2.44% |   os.(*File).Close /usr/local/go/src/os/file_unix.go
                                             0.01s  2.44% |   runtime.deferreturn /usr/local/go/src/runtime/panic.go
----------------------------------------------------------+-------------

so nearly 10% of the user CPU time goes in fs.ReadDirCount and most of that just calling the syscall; the kernel side will cost also.

Open files doesn't strike me as a very interesting metric, although it's one of the three that Scope graphs for a process. I guess we could use a larger buffer, though the vast majority of processes will fit in one read. Maybe sample it less often?

dlespiau commented 6 years ago

A note that many prometheus middleware will have a metric covering that, eg. https://github.com/prometheus/client_golang/blob/master/prometheus/process_collector.go#L61. Maybe we could just get rid of the scope one.

rade commented 6 years ago

The count is of the number of entries in /proc/<pid>/fd. This does include sockets. Which makes the figure useful.

errordeveloper commented 6 years ago

Open files doesn't strike me as a very interesting metric

I agree. It's not very useful, but only until you have too many and hit the limit.