Closed lkarsten closed 5 years ago
[10:28:00] < phk> scn, ping ? What is a "high file count" numerically ? 1000 ? 10000 ? 100000 ?
There were around 1000 directories in /var/lib/varnish/HOST/. If I remember correctly, each directory had a vgc.so file in it.
The configuration system was producing failing VCL (vmod package was not installed) at the time this happened. It was either failing varnishd -C -f foo.vcl
or VCL load (via systemctl reload varnish), every ~60s for at least 24 hours.
There is perhaps an underlaying bug here that these files are left in place, in addition to the user incomprehensible sudden slowness of the logging tools/vsc clients.
Failing VCL compiles should leave no trace in the filesystem, if it does, that's a bug.
Close this for now, new tickets will be opened if there are compile-failure residuals.
Hi.
Today this problem was back on a 6.0.1 rig.
# time varnishstat -1 1>/dev/null
real 0m0.768s
user 0m0.739s
sys 0m0.028s
# find /var/lib/varnish -name vgc.so | wc -l
50
# varnishstat -V
varnishstat (varnish-6.0.1 revision 8d54bec5330c29304979ebf2c425ae14ab80493c)
Copyright (c) 2006 Verdens Gang AS
Copyright (c) 2006-2015 Varnish Software AS
I went through the commits in git master from Aug 6 and a month back, but couldn't find a fix.
This isn't a big deal right now, but for anyone that is using Telegraf for monitoring you quickly run into https://github.com/influxdata/telegraf/blob/master/plugins/inputs/varnish/varnish.go#L81 , where a hardcoded 200ms timeout kills off all data collection from Varnish. That is course something for the telegraf maintainers, but as mentioned earlier in this issue, in some cases varnishstat -1
can take minutes.
There is some underlaying issue here that should be tended to at some point.
No, this is just user error.
I seem to have 50 VCLs loaded, so naturally there are 50 vgc.so files. I will now consult the VCL cleanup script and ask it nicely to do what it is supposed to do.
Sorry for the noise.
Issue: On 6.0.0 from packagecloud, varnishstat in ncurses and varnishstat -j -1 is running slowly and using a lot of CPU.
Expected: varnishstat -1 -j return almost instantly.
This is a system that does automatic VCL reloads, and has maybe ~100 backends.
According to the strace output, it spends 150-200ms for each of the many files in
/var/lib/varnish/HOST/
:Another oddity on this system is that "varnishadm vcl.list" is empty, although there are a massive amount of VCLs loaded. It returns 200, so supposedly not cli_buffer. I'll debug that and report a separate issue.