varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.61k stars 371 forks source link

Varnish does not start #2412

Closed 19h closed 6 years ago

19h commented 6 years ago

I just (half an hour ago) made a deployed a new version of varnish on PsychonautWiki and it took the site down.

I tried to debug the issue, but couldn't find a reason nor did the debug output help. I will update this issue as I find new information.

The issue goes away after rolling back to the previously known working version.

HEAD (before): d909232d803b23ac02d69b245483277910faf120 HEAD (after): f549dddd9177602f1e12c3929d6bb087231df055

Our environment / infrastructure overview can be found here: https://github.com/psychonautwiki/dmt-infrastructure#the-dmt-infrastructure. We are using a stack powered by Docker and always use the latest master of varnish.

The varnish Dockerfile used, the default.vcl and start script can be found here: https://github.com/psychonautwiki/dmt-infrastructure/tree/master/base/varnish.

Expected Behavior

Should just work

Current Behavior

Does not start

Possible Solution

--- still investigating ---

Steps to Reproduce (for bugs)

Simply redeployed image

Context

Varnishd did not start and exits without output1, debug did not yield meaningful output. I will setup a mirror server and try to replicate this issue.

Reverting to August 5 image (d909232d803b23ac02d69b245483277910faf120) fixed the issue.

Your Environment

1 See https://github.com/varnishcache/varnish-cache/issues/2412#issuecomment-326779511

rezan commented 6 years ago

If you can get the full command line for varnishd and run it again with -F, it should run in the foreground and possibly expose some output. Otherwise, check the syslog/messages as it could be another system killing varnishd.

19h commented 6 years ago

The start script expects varnishlog to run, it fails on the newer HEAD with:

VSM: Could not get hold of varnishd, is it running?

It subsequently dies and the container is killed.

I logged all syscalls done by varnishlog using sysdig: https://gist.githubusercontent.com/anonymous/fea4175258f7fd736138599899f2a6cc/raw/f934b7665b762548eb9355dc3f73b6a8e22e51de/out.sysdig

It seems it is trying to open _.vsm_mgt but gets a EBADF upon trying.

While I am investigating, here is a full log of all syscalls made in a test container: https://r1.apx.pub/f8b7314a7ada0dafbb9ae02f0f6707a25fdb1742a3187c0c36ae102685ffa816.log. The filename is the sha256 hash of the file, the log has a size of ~58.3MB (61160586 bytes).

Edit:

Otherwise, check the syslog/messages as it could be another system killing varnishd.

varnishd is running, as per the log, it seems

fgsch commented 6 years ago

Varnish is indeed not running:

Error: Cannot read -f file '/etc/varnish/default.vcl' (No such file or directory)

Your Dockerfile is not copying the default.vcl (nor the secret file) so they are not available in the container.

Furthermore, running from sources but installing libvarnishapi-dev (which in turns pull libvarnishapi1) is a recipe for disaster.

I'm not sure how this was working before (and I'm not clear in which version this was working from your report) but all the evidence so far points to a configuration issue on your side.

fgsch commented 6 years ago

Also, if it's indeed running and somehow default.vcl and secret are available to the container (perhaps via volumes) please check whether not installing libvarnishapi-dev fixes the problem.

This might involve running ldconfig to make sure the libraries are found.

19h commented 6 years ago

Thanks for looking into this.

The configuration files are indeed mounted as volume: https://github.com/psychonautwiki/dmt-infrastructure/blob/master/base/docker-compose.yml#L40. Varnishd is running as daemon (and not exiting), but varnishlog cannot connect to it.

I really don't mean this offensive, but I didn't fully understand please check whether not installing -- I'm not installing libvarnishapi-dev. Because I wasn't sure what you meant, I tried removing apt-get install -y libvarnish-dev, which didn't help. Neither did adding apt-get install -y libvarnishapi-dev.

Another thing: what causes the obscene amounts of close syscalls by varnish? I'm very sure it's not related, but I didn't see them in the August 5 image. Could it be an issue caused by a transitive dependency?

You can probably replicate the exact same issue that I ran into by doing a docker-compose up -d varnish in this folder: https://github.com/psychonautwiki/dmt-infrastructure/tree/master/base. Notice how the image restarts every few seconds.

Thanks again

fgsch commented 6 years ago

Sorry, I meant libvarnish-dev. Did you run ldconfig as I mentioned?

In any case you should not install libvarnish-dev when running from sources. This might cause issues if the API changes, which it did recently.

If you start the image manually you should be able to see exactly what's going on.

Regarding the close calls, this is normal.

fgsch commented 6 years ago

I'm going to close this but feel free to reopen it if it's still occurring after running ldconfig and not installing libvarnish-dev.

19h commented 6 years ago

Sorry, I had a very busy day. Yes, it is still happening. (I can't reopen the issue)

fgsch commented 6 years ago

Ok. I cannot reproduce it using your Dockerfile.

Can you please run from within the container varnishlog and share the output?

19h commented 6 years ago

I ran varnishd -a :80 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,1G and it works perfectly fine (as before), but varnishlog still cannot connect (?) to varnish. Is varnish supposed to create a _.vsm_mgt file? varnishlog seems to try opening that file, fails and then dies.

19h commented 6 years ago

Output of varnishlog:

root@a9dc245f4177:/tmp/varnish-cache# varnishlog
.....
VSM: Could not get hold of varnishd, is it running?
fgsch commented 6 years ago

Also share the ps -fea output.

19h commented 6 years ago
root@a9dc245f4177:/tmp/varnish-cache# ps afux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  18384  3368 pts/0    Ss   22:12   0:00 bash
root       283  0.0  0.0  36256  3828 ?        Ss   22:17   0:00 varnishd -a :80 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,1G
root       295  0.1  0.0 298060  5440 ?        Sl   22:17   0:00  \_ varnishd -a :80 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,1G
root       517  0.0  0.0  36836  2936 pts/0    R+   22:18   0:00 ps afux
fgsch commented 6 years ago

Are you on irc? Might be easier to talk there.

19h commented 6 years ago

Hang on, I'll ping you there. Thanks!