yaoweibin / nginx_upstream_check_module

Health checks upstreams for nginx
http://github.com/yaoweibin/nginx_upstream_check_module
2.04k stars 483 forks source link

upstream_check_module with NGINX 1.7.6 segfaults when reloading configuration #46

Open anprevosto opened 9 years ago

anprevosto commented 9 years ago

Hi,

My OS is RHEL 6.4, and I compiled NGINX 1.7.6 with the master branch of upstream_check_module (November 18th)

Upstream_check_module is the only added module, and check_1.7.2+.patch was successfully applied.

Symptom : NGINX master process crashes with segmentation fault on the third "nginx -s reload" command (1st and 2nd reload are ok).

Infos extracted from the resulting core file :

Core was generated by `nginx: master process /'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f9f30bed158 in __strncmp_sse42 () from /lib64/libc.so.6
(gdb) info threads
* 1 Thread 0x7f9f31d057c0 (LWP 14655)  0x00007f9f30bed158 in __strncmp_sse42 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f9f30bed158 in __strncmp_sse42 () from /lib64/libc.so.6
#1  0x0000000000483672 in ngx_http_upstream_check_find_shm_peer (shm_zone=0x1c36628, data=<value optimized out>)
    at /home/a090966/build/SOURCES/nginx/nginx_upstream_check_module-master/ngx_http_upstream_check_module.c:3995
#2  ngx_http_upstream_check_init_shm_zone (shm_zone=0x1c36628, data=<value optimized out>) at /home/a090966/build/SOURCES/nginx/nginx_upstream_check_module-master/ngx_http_upstream_check_module.c:3897
#3  0x0000000000414aee in ngx_init_cycle (old_cycle=0x1c3a3a0) at src/core/ngx_cycle.c:470
#4  0x00000000004242b2 in ngx_master_process_cycle (cycle=0x1c3a3a0) at src/os/unix/ngx_process_cycle.c:244
#5  0x00000000004082e4 in main (argc=<value optimized out>, argv=<value optimized out>) at src/core/nginx.c:407

Hope you can help !

anprevosto commented 9 years ago

Upstream configuration is very simple :

upstream t7 {
    server 10.70.8.20:8200;
    server 10.70.8.20:8204;
    check interval=10000 rise=2 fall=2 timeout=5000 type=http;
}
yaoweibin commented 9 years ago

OK. I will check this problem.

dmitry-saprykin commented 9 years ago

Hello,

I have also had the same problem and fixed it with following patch:

https://github.com/yaoweibin/nginx_upstream_check_module/pull/48

Kind regards, Dmitry Saprykin

On 5 December 2014 at 08:39, Weibin Yao(姚伟斌) notifications@github.com wrote:

OK. I will check this problem.

— Reply to this email directly or view it on GitHub https://github.com/yaoweibin/nginx_upstream_check_module/issues/46#issuecomment-65749763 .

JonathanSerafini commented 9 years ago

Same issue here, with nginx-1.7.4, nginx_upstream_check_module-0.3.0, and the check_1.7.2+.patch patch applied. Patching with diff generated from PR48 does not seem to have resolved the issue.

Nginx starts up correctly, however a few reloads will cause it to segfault. Error log seems to print out : segfault at 11 ip 00007f976562ffd0 sp 00007fffc59b8d78 error 4 in libc-2.19.so[7f97654ee000+1bb000]

JonathanSerafini commented 9 years ago

Just tested nginx-1.7.8, nginx_upstream_check_module-master, check_1.7.5+ as well as PR48 combo and that seems to have resolved the issue on this end.

yaoweibin commented 9 years ago

I have merged PR48. It seems OK for me. I'm traveling out these days. I will test the issue you said. Than k you.

JonathanSerafini commented 9 years ago

Awesome, thanks a bunch !

yaoweibin commented 9 years ago

I will close this issue. Keep using the newest nginx version.

Thank you all.

gstaples commented 9 years ago

This is still an issue here. I'm getting that exact same backtrace as anprevosto with nginx-1.7.9 and this module updated today. I double and triple checked that I was actually using check-1.7.5+.patch.

I've pasted a full backtrace: http://pastebin.com/DrK52J2A

Can you take another look?

gstaples commented 9 years ago

Just a tad more info: (gdb) print peer_shm->upstream_name->data $6 = (u_char *) 0x1000 <Address 0x1000 out of bounds>

That's in ngx_http_upstream_check_find_shm_peer(), one frame up from the strncpy that segfaults.

gstaples commented 9 years ago

In that same area:

    if (ngx_memcmp(addr->sockaddr, peer_shm->sockaddr, addr->socklen) == 0
        && upstream_name->len == peer_shm->upstream_name->len
        && ngx_strncmp(upstream_name->data, peer_shm->upstream_name->data, upstream_name->len) == 0) {
        return peer_shm;
    }

(gdb) print upstream_name->len $23 = 8 (gdb) print peer_shm->upstream_name->len $24 = 8

So the (upstream_name->len == peer_shm->upstream_name->len) requirement is satisfied.

(gdb) print upstream_name->data $25 = (u_char ) 0x153b421 "ucr-farm" (gdb) print peer_shm->upstream_name->data $26 = (u_char ) 0x1000 <Address 0x1000 out of bounds>

Woops.

yaoweibin commented 9 years ago

Could you show me your full config?

Thank you.

gstaples commented 9 years ago

I'll see what I can do. Our config is pretty big. On Jan 30, 2015 8:02 AM, "Weibin Yao(姚伟斌)" notifications@github.com wrote:

Could you show me your full config?

Thank you.

— Reply to this email directly or view it on GitHub https://github.com/yaoweibin/nginx_upstream_check_module/issues/46#issuecomment-72223518 .

gstaples commented 9 years ago

Basically, no, I can't send you the full config. We have over 370 files included. 8000 lines. Keys. ACLs. My boss would kill me.

Our upstream configs look like a few dozen of these:

upstream mpp-farm { least_conn;

server server1.example.com:50001; server server2.example.com:50001

check interval=5000 fall=3 rise=2 timeout=2000 default_down=false

type=http; check_http_send 'GET /internal/health HTTP/1.0\r\nHost: mpp-farm\r\nUser-Agent: nginx_upstream_check_module/1.2\r\n\r\n'; check_http_expect_alive http_2xx;

keepalive 6;  # per worker pool across all upstream servers

}

On Fri, Jan 30, 2015 at 10:09 AM, Garrick Staples <garrick.staples@gmail.com

wrote:

I'll see what I can do. Our config is pretty big. On Jan 30, 2015 8:02 AM, "Weibin Yao(姚伟斌)" notifications@github.com wrote:

Could you show me your full config?

Thank you.

— Reply to this email directly or view it on GitHub https://github.com/yaoweibin/nginx_upstream_check_module/issues/46#issuecomment-72223518 .

gstaples commented 9 years ago

I just tried a build with nginx-1.7.5 and had the same segfault. After the second HUP, peer_shm is trashed.

grahamhar commented 9 years ago

I also see this issue, I think it is related to number of configured upstreams. I have one config that works OK and this only has 4 upstreams configured. I will try to do a little more to pin down if it is a specific number when this starts to fail.

grahamhar commented 9 years ago

I built with your latest master and nginx 1.7.9 and this seems to stop this problem happening for me.

gstaples commented 9 years ago

I suspect the number of configs didn't actually have anything to do with it. It was just non-deterministic. There was a double free() or something going on.

Our setup with its giant config seems nice and stable now.

On Mon, Feb 9, 2015 at 8:50 AM, grahamhar notifications@github.com wrote:

I built with your latest master and nginx 1.7.9 and this seems to stop this problem happening for me.

— Reply to this email directly or view it on GitHub https://github.com/yaoweibin/nginx_upstream_check_module/issues/46#issuecomment-73543592 .

MattDevUK commented 9 years ago

Can I ask what the "fix" was? We're experiencing the same Segfault when reloading (Always on the second reload). Using Nginx v1.7.2 with the 1.7.2 patch and version 0.3.0 of the module. :S

Ah, just saw the pull Request #48 was merged after 0.3.0. Is there any info on when the next release will be? Not too keen on working with a master.

EDIT: Used Master branch and it fixed the segfaults. So now just want to wait for the next release if possible :D