varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.68k stars 377 forks source link

Varnish 7.1.0 - Assert error in v1f_read(), http1/cache_http1_vfp.c line 79 => errno = 4 (Interrupted system call) #3801

Open trendymail opened 2 years ago

trendymail commented 2 years ago

Hello!

Varnish is running inside an OpenVZ 7 container (I know, performance is suboptimal).

Panic is randomly triggered when a snapshot of the CT is created (in order to backup its data).

vzctl snapshot --name --skip_dump

Panic at: Wed, 20 Apr 2022 10:49:47 GMT Assert error in v1f_read(), http1/cache_http1_vfp.c line 79: Condition(VTCP_Check(i)) not true. ... errno = 4 (Interrupted system call)

Expected Behavior

Varnish should continue working inside the container

Current Behavior

Child panic and all cached objects are lost.

Possible Solution

This is beyond my knowledge... ^^

Steps to Reproduce (for bugs)

  1. Create an OpenVZ 7 container (Debian Buster)
  2. Install (inside CT) latest Varnish version from https://packagecloud.io/varnishcache/varnish71/install#manual-deb
  3. On Host, create a snapshot of the container (vzctl --verbose snapshot --skip_dump)
  4. If Varnish (inside CT) does not panic, delete CT snapshot and try again
  5. varnishadm panic.show

Context

I need to backup the OpenVZ 7 container every day (only CT data, not its memory, this is why "--skip_dump" is used)

Your Environment

Host: Virtuozzo 7.0.18 Container: Debian 10 (Buster) Varnish: 7.1.0 (from packagecloud.io)

varnishadm panic.show

Panic.txt

bsdphk commented 2 years ago

Discussed at bugwash.

Uncertain if EINTR means "kernel clock stepped" so you may want to reevaluate this socket in re: timeouts or if some signal was actually sent.

Handling options amount to "Have VTCP_Check() ignore EINTR" which makes it fatal for the socket, or changing a lot of code to do proper, considered, retries.

dridi commented 2 years ago

As far as read and write are concerned on linux EINTR may mean that a socket timeout triggered, and in that case no data was read or written. We should check what is documented for other system calls that may return with EINTR despite SA_RESTART.