varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.69k stars 377 forks source link

Missing EMFILE handling in vsmw_append_record() #3663

Open nigoroll opened 3 years ago

nigoroll commented 3 years ago

Under high load with hundreds of thousands of requests serializing on a small number of objects, for which the backend returned an error with a long response time, the following Panic was observed, triggered by exhaustion of available file descriptors. The Panic output is deliberately shortened to avoid publication of identifiable information, but I am convinced that it contains sufficient information for identifying and fixing the bug.

Error: Child (28) Panic at: Fri, 06 Aug 2021 20:33:54 GMT
Assert error in vsmw_append_record(), common/common_vsmw.c line 204:
  Condition(fd >= 0) not true.
version = varnish-trunk revision f4c8362fce819db3536763d084f0018b2f3d0e5a, vrt api = 13.0
ident = Linux,3.10.0-862.14.4.el7.x86_64,x86_64,-jnone,-smalloc,-sdefault,-hcritbit,epoll
now = 85539400.878266 (mono), 1628280667.227404 (real)
Backtrace:
  0x44b189: /opt/varnish/sbin/varnishd() [0x44b189]
  0x44b85f: /opt/varnish/sbin/varnishd() [0x44b85f]
  0x4e22b7: /opt/varnish/sbin/varnishd(VAS_Fail+0x53) [0x4e22b7]
  0x479530: /opt/varnish/sbin/varnishd() [0x479530]
  0x47983e: /opt/varnish/sbin/varnishd() [0x47983e]
  0x47ab07: /opt/varnish/sbin/varnishd(VSMW_Free+0x1c7) [0x47ab07]
  0x478f03: /opt/varnish/sbin/varnishd(VRT_VSC_Destroy+0x1a4) [0x478f03]
  0x4bd191: /opt/varnish/sbin/varnishd(VSC_vbe_Destroy+0x59) [0x4bd191]
  0x419fa3: /opt/varnish/sbin/varnishd() [0x419fa3]
  0x41a1a8: /opt/varnish/sbin/varnishd() [0x41a1a8]
errno = 24 (Too many open files)
nigoroll commented 3 years ago

The vsmw code makes the assertion common in varnish that there is enough memory available, but also that we have enough disk space and file descriptors available. I think these assertions are reasonable, unless we would be willing to add a lot of new error handling, which I think we should not. So while I consider it reasonable to require administrators to configure and monitor their systems appropriately with respect to memory and disk space, I see an issue with the assertions on available file descriptors: The administrator can not do anything about them, and I think the root cause for this issue is that we accept new connections until we run out of file descriptors. So at this point I think that the only way to handle this issue (and probably related ones which might be lurking in the code) is to reserve some amount of file descriptors. I am not sure what the best way would be implement such a limit. The only idea I have so far is to ensure that the number of client connections plus the total number of backend connections stay below getrlimit(RLIMIT_NOFILE) minus a value/percentage of file descriptors configured as reserved for other uses.

nigoroll commented 3 years ago

(edit: update to better parameter names, clarification)

from discussion with @Dridi and @bsdphk :

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_14

All functions that open one or more file descriptors shall, unless specified otherwise, atomically allocate the lowest numbered available (that is, not already open in the calling process) file descriptor at the time of each allocation. Where a single function allocates two file descriptors (for example, pipe() or socketpair()), the allocations may be independent and therefore applications should not expect them to have adjacent values or depend on which has the higher value.

The proposed parameter is max_fd_client:

We could also have max_fd_backend with the same semantics when we need it, but we would start with max_fd_client only.

These parameters would ensure that any file descriptor above the maximum of both limits would be available for internal housekeeping, vmods and any other code besides client and backend core code.

These limits would not be a guarantee:

nigoroll commented 3 years ago

Except for the question if we should impose a delay before we get close to the limit, this ticket is ready for implementation. In the first iteration, the acceptor would just treat any file descriptor above the max as an error, close it and pace as with other accept errors.