Open GoogleCodeExporter opened 8 years ago
Yes, can you show me your config? And this info:
./nginx -V
Do you install jvm_route module from the download tarballs or the svn
repository?
Thanks
Original comment by nbubi...@gmail.com
on 14 Dec 2011 at 3:02
I'm using the latest code from repository. I also tried downgrading to 0.8.55
but the problem still exists.
nginx: nginx version: nginx/1.0.10
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-51)
nginx: TLS SNI support enabled
nginx: configure arguments: --prefix=/usr --sbin-path=/usr/sbin/nginx
--conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log
--pid-path=/var/run/nginx/nginx.pid --lock-path=/var/lock/nginx.lock
--user=nginx --group=nginx --with-http_ssl_module --with-http_dav_module
--with-http_flv_module --with-http_gzip_static_module
--with-http_stub_status_module --with-http_realip_module
--add-module=/home/software/nginx_mogilefs_module
--add-module=/home/software/nginx-upload-progress-module
--add-module=/home/software/nginx_upstream_jvm_route
--with-openssl=/home/software/openssl-0.9.8r --with-openssl-opt=enable-tlsext
--with-debug --http-log-path=/var/log/nginx/access.log
--http-client-body-temp-path=/cache/nginx/client
--http-proxy-temp-path=/cache/nginx/proxy
--http-fastcgi-temp-path=/cache/nginx/fcgi
Original comment by michael...@gmail.com
on 14 Dec 2011 at 3:08
Thanks.
Can you show me you nginx.conf? Does this problem occur frequently or rarely?
I can't reproduce this problem in my box.
Original comment by nbubi...@gmail.com
on 14 Dec 2011 at 4:50
It happens rather frequently, about once every minute.
Here's my nginx.conf snippet:
worker_processes 8;
worker_rlimit_nofile 8192;
events {
use epoll;
worker_connections 8192;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
server_names_hash_bucket_size 128;
open_log_file_cache max=100 inactive=20s min_uses=2 valid=1m;
keepalive_timeout 65 20;
client_max_body_size 80m;
client_body_buffer_size 128k;
upstream tomcat {
server 10.0.0.1:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=a;
server 10.0.0.2:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=b;
server 10.0.0.3:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=c;
server 10.0.0.1:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=a;
server 10.0.0.2:8880 weight=1 max_fails=1 fail_timeout=15s srun_id=b;
jvm_route $cookie_JSESSIONID reverse;
}
server {
listen 80;
server_name XXXXX;
location ~ \.jsp$ {
proxy_pass http://tomcat;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
proxy_connect_timeout 30s;
proxy_read_timeout 90s;
proxy_next_upstream error timeout;
proxy_ignore_client_abort on;
}
}
}
Original comment by michael...@gmail.com
on 14 Dec 2011 at 7:00
Thanks. I'll test it. You can use the ip_hash module temporarily in your
production server.
Original comment by nbubi...@gmail.com
on 14 Dec 2011 at 7:28
I was looking at the code but could not find any reason why uscf would become
NULL. Quite strange...
Original comment by michael...@gmail.com
on 14 Dec 2011 at 8:16
Yes, That also confuses me too. It seems the point of uscf is overlapped or
uninitialized.
I still can't reproduce your bug :-( ...
Original comment by nbubi...@gmail.com
on 14 Dec 2011 at 9:05
Since I have the environment to reproduce fairly easily, if you think it's
helpful, I'm more than happy to add a few lines of code to help you diagnose
this problem. Just let me know where to put them. Thanks!
Original comment by michael...@gmail.com
on 14 Dec 2011 at 9:19
Thanks. You are a great hacker.
I will review the code first. If I find some dubious code, I will fix it and
you can test it.
Original comment by nbubi...@gmail.com
on 14 Dec 2011 at 9:32
Hi Michael,
Can you try the latest code? And show me the error.log if there is any error.
Hope this can help you.
Thanks.
Original comment by nbubi...@gmail.com
on 15 Dec 2011 at 7:01
Just applied the new code, but the problem still exists with only the core
dumped msg in error.log:
2011/12/15 15:43:09 [alert] 30806#0: worker process 30832 exited on signal 11
(core dumped)
2011/12/15 15:43:13 [alert] 30806#0: worker process 30825 exited on signal 11
(core dumped)
2011/12/15 15:43:20 [alert] 30806#0: worker process 30829 exited on signal 11
(core dumped)
2011/12/15 15:43:26 [alert] 30806#0: worker process 30833 exited on signal 11
(core dumped)
The dump has the same error, complaining uscf being a NULL pointer...
Original comment by michael...@gmail.com
on 15 Dec 2011 at 7:49
Is there any error in the log?
Original comment by nbubi...@gmail.com
on 15 Dec 2011 at 7:51
It seems quite frequent. Can you turn on the debug.log for a while? And give me
the debug.log.
error_log logs/error.log debug;
Thanks.
Original comment by nbubi...@gmail.com
on 15 Dec 2011 at 7:57
Here's the partial debug log for one dumped process just before it crashed.
2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 000000000913F730:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3721 accept: 189.202.24.49 fd:207
2011/12/15 16:14:10 [debug] 31214#0: *3721 event timer add: 207:
60000:1323936910071
2011/12/15 16:14:10 [debug] 31214#0: *3721 epoll add event: fd:207 op:1
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: post event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: delete posted event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 0000000009021140:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3727 accept: 189.202.24.49 fd:205
2011/12/15 16:14:10 [debug] 31214#0: *3727 event timer add: 205:
60000:1323936910076
2011/12/15 16:14:10 [debug] 31214#0: *3727 epoll add event: fd:205 op:1
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: post event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: delete posted event 00002AE37C52D078
2011/12/15 16:14:10 [debug] 31214#0: accept on 0.0.0.0:80, ready: 0
2011/12/15 16:14:10 [debug] 31214#0: posix_memalign: 0000000008FED830:256 @16
2011/12/15 16:14:10 [debug] 31214#0: *3728 accept: 189.202.24.49 fd:209
2011/12/15 16:14:10 [debug] 31214#0: *3728 event timer add: 209:
60000:1323936910077
2011/12/15 16:14:10 [debug] 31214#0: *3728 epoll add event: fd:209 op:1
ev:80000001
2011/12/15 16:14:10 [debug] 31214#0: *2580 posix_memalign:
0000000008E07CE0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *2580 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *2580 http request line: "GET XXXXX
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http args: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Pragma: no-cache"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Range: bytes=0-148478"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "User-Agent:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR
2.0.5
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Cache-Control:
no-cache, max-age=86400"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *2580 http header done
2011/12/15 16:14:10 [debug] 31214#0: epoll del event: fd:26 op:2 ev:00000000
2011/12/15 16:14:10 [debug] 31214#0: epoll del event: fd:27 op:2 ev:00000000
2011/12/15 16:14:10 [debug] 31214#0: *3518 posix_memalign:
0000000008FE9ED0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *3518 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *3518 http request line: "GET XXXXX
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http args: ""
2011/12/15 16:14:10 [debug] 31214#0: *3518 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept-Language: zh-cn"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Accept-Encoding: gzip,
deflate"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "User-Agent:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727;
.NET CLR
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *3518 http header done
2011/12/15 16:14:10 [debug] 31214#0: epoll add event: fd:26 op:1 ev:00000001
2011/12/15 16:14:10 [debug] 31214#0: epoll add event: fd:27 op:1 ev:00000001
2011/12/15 16:14:10 [debug] 31214#0: *3721 post event 00002AE37C531790
2011/12/15 16:14:10 [debug] 31214#0: *3721 delete posted event 00002AE37C531790
2011/12/15 16:14:10 [debug] 31214#0: *3721 malloc: 0000000008FEBEF0:1296
2011/12/15 16:14:10 [debug] 31214#0: *3721 posix_memalign: 0000000008EF7960:256
@16
2011/12/15 16:14:10 [debug] 31214#0: *3721 malloc: 0000000008FEC410:1024
2011/12/15 16:14:10 [debug] 31214#0: *3721 posix_memalign:
0000000009039AF0:4096 @16
2011/12/15 16:14:10 [debug] 31214#0: *3721 http process request line
2011/12/15 16:14:10 [debug] 31214#0: *3721 recv: fd:207 904 of 1024
2011/12/15 16:14:10 [debug] 31214#0: *3721 http request line: "GET XXXXX
HTTP/1.1"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http uri: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http args: ""
2011/12/15 16:14:10 [debug] 31214#0: *3721 http exten: "XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http process request header line
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Host: XXXXX"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "User-Agent:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; zh-cn) AppleWebKit/533.21.1
(KHTML,
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept: */*"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept-Language: zh-cn"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Accept-Encoding: gzip,
deflate"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Cookie:
__utmz=25815611.1323908752.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
__utmc=2581
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header: "Connection: keep-alive"
2011/12/15 16:14:10 [debug] 31214#0: *3721 http header done
2011/12/15 16:14:10 [alert] 31212#0: worker process 31214 exited on signal 11
(core dumped)
Original comment by michael...@gmail.com
on 15 Dec 2011 at 8:31
It seems none of the request finished. That's too bad. If you don't use my
module, the problem disappears. Is that right?
Original comment by nbubi...@gmail.com
on 15 Dec 2011 at 8:49
I've done more testing and confirmed that it was not crashed by the plugin.
Sorry about the trouble and thanks again for all the help!
Original comment by michael...@gmail.com
on 16 Dec 2011 at 12:08
OK, I'm luck. I'm interested in what caused this problem? It seems some codes
write the memory illegally.
Original comment by nbubi...@gmail.com
on 19 Dec 2011 at 1:57
Turns out It has something to do with my configuration. It's related to me
having a proxy_pass setting within an IF clause. Somehow it did not set the
correct proxy_pass and made the upstream become NULL.
What's strange though, was the fact that it didn't cause me any trouble until
recently. Still not sure what triggered this crashing behaviour since I've had
this configuration for a very long time.
I guess IF statements are truly evil: http://wiki.nginx.org/IfIsEvil
Original comment by michael...@gmail.com
on 19 Dec 2011 at 2:19
Yes, I see. 'If' is evil. You had better to use 'set' or 'rewrite' in the 'if'
block.
Original comment by nbubi...@gmail.com
on 19 Dec 2011 at 2:27
Original issue reported on code.google.com by
michael...@gmail.com
on 14 Dec 2011 at 2:26