Closed ghost closed 6 years ago
Can you share relevant parts of your configuration and a varnishlog excerpt showing the issue, please?
In particular, are you using a probe and, if yes, have you considered the .initial
probe attribute?
Ok, i think that i'm missing the initial probe attribute, this is a sample of a probe:
#probe login, use /login to probe
probe login_probe {
.url = "/login";
.interval = 15s;
.timeout = 3s;
.window = 5;
.threshold = 3;
}
before searching for other i try to set the .initial, na then, if the issue is not solved i gave all other info.
Thanks
So, even setting initial to 5 this does not solve the issue: the first load of a page give me the service page for unhealty backend.
This is the relevant part of the configuration:
#director
new phpipam = dynamic.director(
share = HOST,
whitelist = s2,
port = "8184",
first_byte_timeout = 10m,
connect_timeout = 5s,
between_bytes_timeout = 1m,
probe = loginslash_probe,
ttl = 5m
);
#probe loginslash, use /login/ to probe
probe loginslash_probe {
.url = "/login/";
.interval = 15s;
.timeout = 3s;
.window = 5;
.threshold = 3;
.initial = 5;
}
#internal ipam service
if (req.http.host ~ "^ipam.diennea.lan$") {
include "/etc/varnish/acl/https-301.vcl"; #this is a simple http to https redirect
set req.backend_hint = phpipam.backend("dockerkube-lb.dna");
}
Unfortunately i'm not finding any call to this service in the logs, an this is really weird because the service is working.
varnishlog should always tell you what's happening
So....using varnishlog i see this message:
Anyway i've captured the relevant part
* << BeReq >> 266160
- Begin bereq 266159 pass
- Timestamp Start: 1521185882.838929 0.000000 0.000000
- BereqMethod GET
- BereqURL /login/timeout/
- BereqProtocol HTTP/1.1
- BereqHeader Host: ipam.diennea.lan
- BereqHeader X-Nginx-Forwarded-For: 10.168.1.173
- BereqHeader X-Forwarded-Proto: https
- BereqHeader X-Server-Ip: 10.168.10.205
- BereqHeader user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0
- BereqHeader accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- BereqHeader accept-language: en-US,en;q=0.5
- BereqHeader accept-encoding: gzip, deflate, br
- BereqHeader referer: https://ipam.diennea.lan/
- BereqHeader upgrade-insecure-requests: 1
- BereqHeader pragma: no-cache
- BereqHeader cache-control: no-cache
- BereqHeader cookie: _ga=GA1.2.738750100.1513093686; table-page-size=50; __utma=144425325.738750100.1513093686.1521132450.1521185507.6; __utmz=144425325.1519902084.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=144425
325; _gid=GA1.2.712278224.1521040902; p
- BereqHeader X-Received-Forwarded-For: ::1
- BereqHeader X-Forwarded-For: 10.168.1.173
- BereqHeader X-Varnish: 266160
- VCL_call BACKEND_FETCH
- VCL_return fetch
- BackendOpen 126 reload_2018-03-16T083754.fenoavail 10.168.10.124 8083 10.168.10.207 55182
- BackendStart 10.168.10.124 8083
- Timestamp Bereq: 1521185882.839544 0.000615 0.000615
- Timestamp Beresp: 1521185882.840492 0.001562 0.000948
- BerespProtocol HTTP/1.0
- BerespStatus 503
- BerespReason Service Unavailable
- BerespHeader Date: Fri, 16 Mar 2018 07:38:02 GMT
- BerespHeader Server: Apache/2.2.15 (CentOS)
- BerespHeader Cache-Control: no-cache, no-store
- BerespHeader Content-Length: 1364
- BerespHeader Connection: close
- BerespHeader Content-Type: text/html; charset=UTF-8
- VCL_call BACKEND_RESPONSE
- TTL VCL 0 0 0 1521185883
- VCL_return deliver
- BerespUnset Content-Length: 1364
- BerespHeader Content-Encoding: gzip
- BerespHeader Vary: Accept-Encoding
- Storage malloc Transient
- ObjProtocol HTTP/1.0
- ObjStatus 503
- ObjReason Service Unavailable
- ObjHeader Date: Fri, 16 Mar 2018 07:38:02 GMT
- ObjHeader Server: Apache/2.2.15 (CentOS)
- ObjHeader Cache-Control: no-cache, no-store
- ObjHeader Content-Type: text/html; charset=UTF-8
- ObjHeader Content-Encoding: gzip
- ObjHeader Vary: Accept-Encoding
- Fetch_Body 3 length -
- Gzip G F E 1364 763 80 6024 6034
- BackendClose 126 reload_2018-03-16T083754.fenoavail
- Timestamp BerespBody: 1521185882.841104 0.002175 0.000612
maybe this is related to or particular setup: we have a default backend thath will be sete prior to searching for appropriate host:
set req.backend_hint=notavailable404.backend();
include "/etc/varnish/requests/requests.vcl";
in requests.vcl we have definition for our vhost that, if match is found, select the appropriate backend, the theory that was working well with varnish 2.0 to varnish 4.1 is that notavailable404 well be used only if no one of the vhost in requests.vcl match the requested host, so i think thath with the new varnish this setup is not working as with older version.
do you have any hint about this?
Other info: with the debug features of firefox i see that the first request give a 503, but this seems a nonsense :|
Some other infos: changing the timeout dosen't solve the problem.
commenting out set req.backend_hint=notavailable404.backend(); dosen't solve
Other service that use the standard round robin director aren't affected for the problem.
Another service which use dynamic director dosen't show the error page after reload, bu we get kicked off from the session-
Maybe this is an interaction between the director and our setup, but isin't easy to understand what is.
Accourding to the varnishlog, your backend explicitly sends a 503. I see no connection to vmod_dynamic
. I cannot help you unless you show me a varnishlog which shows the issue. There are lots of options to query it specifically, including option -d
to use the history still in memory.
Yes, the problem is that varnish dosen't try the right backend ad first time, and use the fenoavail, and is not related to the vmod_dynamic
Close the issue, and thanks for your support
Sorry, i'm back and i've found where the glitch is, but not why it happens
For some corporate reasons we have some backend servers that serve specific error pages, and we have made in this way:
include "/etc/varnish/requests/requests.vcl"; include "/etc/varnish/errpages/conf.vcl";
the errpages/conf.vcl is
if (!std.healthy(req.backend_hint)) {
set req.backend_hint = maintswitch.backend();
if (!std.healthy(req.backend_hint)) {
if (req.http.X-ErrType == "backend") {
unset req.http.X-ErrType;
set req.backend_hint = notavailablebe.backend();
}
elsif (req.http.X-ErrType == "ducati") {
set req.backend_hint = ducatiupdate.backend();
}
else {
set req.backend_hint = notavailablefe.backend();
}
}
else {
if (req.http.X-ErrType == "backend") {
unset req.http.X-ErrType;
set req.backend_hint = maintenancebe.backend();
}
elsif (req.http.X-ErrType == "ducati") {
set req.backend_hint = ducatiupdate.backend();
}
else {
set req.backend_hint = maintenancefe.backend();
}
}
}
so the problem is that at the first connection
if (!std.healthy(req.backend_hint))
return true (backend unhealty) for backends server by vmod_dynamic, an varnish correctly use the
notavailablefe.backend();
which returns the 503 as in varnishlog
as confirmation if i comment out the
include "/etc/varnish/errpages/conf.vcl";
the first call after a vcl reload works well as expected
hope to have explained well the situation
I know is a compelx setup, and i'm actively searching to find a better way to make what we need.
@nigoroll have you seen my last message, there is anything you can say about?
It sounds like you would want to set .initial >= .threshold
for your probe.
Ummh, already tries intial = thresold with no success, i try with > and see if it works
Also check varnishlog, in particular the Backend_health
tag.
.initial == .threshold
should suffice, actually.
So, with initial == window it works, no service down after reload. I will do more tests as we are under a lighter workloa.
Very thanks, and shame on me because i didn't have tried this config before.
Sigh, no, the problem isin't solved.
I've done a varnish reload and capured some logs
these ar with varnishlog -g raw -q '* ~ vmod-dynamic'
Log abandoned
Log reacquired
Log abandoned
Log reacquired
0 Timestamp - vmod-dynamic reload_2018-03-30T092814.phpipam(dockerkube-lb.dna) Lookup: 1522394904.558255 0.000000 0.000000
0 Timestamp - vmod-dynamic reload_2018-03-30T092814.phpipam(dockerkube-lb.dna) Results: 1522394904.562455 0.004200 0.004200
0 Timestamp - vmod-dynamic reload_2018-03-30T092814.phpipam(dockerkube-lb.dna) Update: 1522394904.568008 0.009752 0.005552
and these with varnishlog -g raw | grep Backend_health | grep -i ipam
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.124471 0.099950 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.121646 0.104392 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.135277 0.107116 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.130498 0.105702 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.093696 0.107332 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.120738 0.114492 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.088463 0.108676 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.100355 0.113608 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.107402 0.101813 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.118870 0.108011 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.107264 0.107153 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.123834 0.110235 HTTP/1.1 200 OK
Log abandoned
Log reacquired
Log abandoned
Log reacquired
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.103631 0.106407 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.122540 0.116504 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.112713 0.109685 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.096416 0.109310 HTTP/1.1 200 OK
Log abandoned
Log reacquired
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy ------H 5 5 5 0.000000 0.000000
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.131276 0.032819 HTTP/1.1 200 OK
Log abandoned
Log reacquired
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.132275 0.109428 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.100486 0.105486 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.106332 0.107591 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T092814.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.126403 0.114277 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.169330 0.122138 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.164584 0.128524 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.173511 0.125642 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T091328.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.169129 0.124265 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.51) Still healthy 4--X-RH 5 5 5 0.173798 0.066378 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.56) Still healthy 4--X-RH 5 5 5 0.180521 0.065793 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.52) Still healthy 4--X-RH 5 5 5 0.171341 0.067449 HTTP/1.1 200 OK
0 Backend_health - reload_2018-03-30T093035.phpipam.dockerkube-lb.dna(10.168.11.50) Still healthy 4--X-RH 5 5 5 0.195845 0.073446 HTTP/1.1 200 OK
So the backend seems always fine, but the first call to
std.healthy(req.backend_hint)
returns false, and the 503 error page is displayed.
One thing that could cause the std.healthy
to report a sick director is the lack of backends until the first lookup completes. According to your timestamps it's a 10ms window.
There is a parameter for that:
However, while I'm sure it applies to backend selection, maybe it should also be honored by the health code path.
Backend selection does it:
Nothing is done once the lock is acquired in the dynamic_healthy
callback:
@nigoroll: sounds legit to me.
@Dridi @nigoroll So there is a plan to change the beaviour of the dynamic_healthy ?
Hi.
I have Varnish 5.2.1 with vmod dynamic installed from Yum Repositories on CentOS7 Server, and i'm facing this situation: after a vcl reload the first call to a site server by vmod_dynamic director give me the mainenance page, like the backend was sick, bat the backend is up and running, so simply hitting F5 gave me access to the site.
Maybe a problem with the first dns lookup? Or maybe i've made some misconfiguration?
What other info do you need to help me with this problem?
Best Regard