phusion / passenger

A fast and robust web server and application server for Ruby, Python and Node.js
https://www.phusionpassenger.com/
MIT License
5.01k stars 547 forks source link

No default temp path in nginx module causes segfault. #2189

Closed aried3r closed 5 years ago

aried3r commented 5 years ago

Issue report

Question 1: What is the problem? Your answer:

With nginx 1:1.15.8-8.6.0.1~xenial1 on a Rails 5.2.2.1 app I can use Rails send_file without a problem. When upgrading to nginx/xenial 1:1.15.8-8.6.0.2~xenial1 this doesn't work anymore (for most file, not all of them. For example thumbnails of images work, but the originals don't).

nginx' error.log shows:

2019/03/22 13:10:47 [alert] 21227#21227: worker process 2032 exited on signal 11
2019/03/22 13:10:47 [alert] 21227#21227: worker process 2031 exited on signal 11
2019/03/22 13:10:47 [alert] 21227#21227: worker process 23789 exited on signal 11
2019/03/22 13:12:02 [alert] 21227#21227: worker process 23788 exited on signal 11
2019/03/22 13:12:03 [alert] 21227#21227: worker process 23790 exited on signal 11
2019/03/22 13:12:03 [alert] 21227#21227: worker process 23898 exited on signal 11
2019/03/22 13:12:53 [alert] 21227#21227: worker process 23901 exited on signal 11
2019/03/22 13:16:02 [alert] 21227#21227: worker process 23902 exited on signal 11
2019/03/22 13:16:02 [alert] 21227#21227: worker process 23976 exited on signal 11
2019/03/22 13:16:02 [alert] 21227#21227: worker process 24219 exited on signal 11

I can reproduce this by upgrading nginx, nginx-common, nginx-extras to 1:1.15.8-8.6.0.2~xenial1 and then downgrading again to 1:1.15.8-8.6.0.1~xenial1.

Since I am using the passenger provided nginx packages, I believe this to be a passenger problem, but that's just an assumption. I couldn't find the source repo or changelog of these releases.

I'm not sure if I can provide a reproduction case since it just happens on our staging server after the upgrade of the packages and not for all files. I'm happy to provide additional information!

Question 2: Passenger version and integration mode:

$ passenger --version
Phusion Passenger 6.0.2
$ nginx -v
nginx version: nginx/1.15.8

Question 3: OS or Linux distro, platform (including version):

Ubuntu 16.04 LTS

Question 4: Passenger installation method:

Phusion APT repo

# /etc/apt/sources.list.d/phusionpassenger.list
deb      "https://oss-binaries.phusionpassenger.com/apt/passenger" xenial main
deb-src  "https://oss-binaries.phusionpassenger.com/apt/passenger" xenial main

Question 5: Your app's programming language (including any version managers) and framework (including versions):

Ruby 2.6.2, rbenv, Rails 5.2.2.1

Question 6: Are you using a PaaS and/or containerization? If so which one?

No.

Question 7: Anything else about your setup that we should know?

Using letsencrypt and the following nginx config (don't mind nginx 1.14.0 being shown there, it's just what the page generates): https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=nginx-1.14.0&openssl=1.0.2g&hsts=yes&profile=modern

sauliusgrigaitis commented 5 years ago

I can confirm this. Same problem on a few servers running Ubuntu 16.04 and Ubuntu 18.04. Nginx just exits whenever the file is a bit larger (starting from around a few hundred of KBs).

CamJN commented 5 years ago

Thanks for this report I'll look into it asap.

CamJN commented 5 years ago

Docs link for myself: https://api.rubyonrails.org/v5.2/classes/ActionController/DataStreaming.html#method-i-send_file

CamJN commented 5 years ago

@aried3r since you indicated you were interested, here's our ubuntu/debian nginx repo source: https://github.com/phusion/passenger_apt_automation/tree/master/debian_specs/nginx

CamJN commented 5 years ago

Ok I'm going to need more info, as unfortunately I'm not able to reproduce this issue in my test environment. I've setup a Xenial vm, running our nginx-extras package (1:1.15.8-8.6.0.2~xenial1) and a basic Rails (5.2.2.1) app, with a route that responds with send_file. The file it sends is just under a MB (897KB).

Here's the controller method:

def sendfile
  send_file '/var/www/blog/log/production.log'
end

So is there something else that you can think of that I should do to make my test case more similar to your setup?

sauliusgrigaitis commented 5 years ago

Maybe you can try with a much larger file? Say at least 100MB.

CamJN commented 5 years ago

Sure can, i just used this one because you mentioned it started from around a few hundred KBs.

CamJN commented 5 years ago

Tested with a 143MB file, same result unfortunately.

CamJN commented 5 years ago

Perhaps one of you could attach gdb to nginx and grab a backtrace of the segfault? Since I'm still having no luck reproducing this on my end.

jhill commented 5 years ago

Core was generated by `nginx: worker pr'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00000000004424d8 in ngx_create_temp_file ()

(gdb) backtrace

0 0x00000000004424d8 in ngx_create_temp_file ()

1 0x00000000004425ae in ngx_write_chain_to_temp_file ()

2 0x000000000045c31f in ?? ()

3 0x000000000045d46a in ngx_event_pipe ()

4 0x000000000048d7aa in ?? ()

5 0x000000000048c5cf in ?? ()

6 0x0000000000464679 in ?? ()

7 0x0000000000459ae7 in ngx_process_events_and_timers ()

8 0x00000000004626c5 in ?? ()

9 0x0000000000460d0c in ngx_spawn_process ()

10 0x0000000000461e84 in ?? ()

11 0x000000000046330f in ngx_master_process_cycle ()

12 0x00000000004381c8 in main ()

CamJN commented 5 years ago

Ah, so we removed some code that set ngx_http_proxy_temp_path in this commit: https://github.com/phusion/passenger/commit/7d4251bb4aec5cef6896f6d1905a30e99d21befe, I wonder if that variable was used as a default for all temp files for nginx (it didn't seem to cause an issue in my tests). Can you set a proxy_temp_path config and see if the issue persists?

CamJN commented 5 years ago

Also for my own reference here's the function that's segfaulting: https://github.com/nginx/nginx/blob/aa741f87273f2137d9a52080593c5fe6f1d1b0ea/src/core/ngx_file.c#L140-L237

aried3r commented 5 years ago

@jhill, thanks for providing the stack trace. Is there any guide I could follow to also produce one? Just in case mine would be different.

@CamJN, thanks for investigating! A side question, is using passenger's nginx version still the recommended way to go once moving from Ubuntu 16.04 to 18.04? I saw in the installation guides that after 17.04 using just libnginx-mod-http-passenger with the provided nginx package from Ubuntu or nginx directly is being shown in the guides. I'm mainly asking out of interest and if this would have prevented this bug from occurring on your systems, since there wouldn't have been a custom built of nginx. Thanks!

CamJN commented 5 years ago

@aried3r we recommend using the system nginx for ubuntu 18.04. In-fact in light of this issue I just looked and I might be able to get ubuntu 16.04 onto a dynamic module+system nginx too.

Also possibly of interest, if the app is using send_file and nginx is writing the response to a file on disk, then there is no speedup. send_file is only faster if the kernel takes over writing directly into the socket, but by writing to a tempfile first any speedup is going to be wasted.

As for the backtrace, this should work:

gdb nginx
set detach-on-fork off
run
# run curl in another terminal or make a request in your browser to cause the issue, then come back here
bt
CamJN commented 5 years ago

@sauliusgrigaitis or @aried3r have you had a chance to try this with ngx_http_proxy_temp_path set?

sauliusgrigaitis commented 5 years ago

@CamJN do you mean set proxy_temp_path ?

CamJN commented 5 years ago

@sauliusgrigaitis yes, sorry.

sauliusgrigaitis commented 5 years ago

@CamJN I tried with proxy_temp_path set to custom path and it still crashes.

sauliusgrigaitis commented 5 years ago

Is there anything else we could try?

CamJN commented 5 years ago

At this point someone would need to install the nginx-extras-dbg package, then start nginx under gdb, then trigger the segfault, which would then provide the source code line that is actually causing the issue within the ngx_create_temp_file function; and then they could inspect the state of the program, such as the variables (esp pointers) to see what's actually breaking. It's likely that an invalid pointer was dereferenced or an array was indexed out of bounds, but it could be something else.

kevgrig commented 5 years ago

This issue is affecting me as well (Passenger; send_file; nginx 1.17.0). Backtrace:

(gdb) bt
#0  ngx_create_temp_file (file=file@entry=0xf8f898, path=0x0, pool=0xf91130, persistent=0, clean=0, access=0) at src/core/ngx_file.c:159
#1  0x00000000004548ab in ngx_write_chain_to_temp_file (tf=0xf8f898, chain=chain@entry=0xf8f9a0) at src/core/ngx_file.c:114
#2  0x000000000046e126 in ngx_event_pipe_write_chain_to_temp_file (p=p@entry=0xf92968) at src/event/ngx_event_pipe.c:839
#3  0x000000000046eec1 in ngx_event_pipe_read_upstream (p=0xf92968) at src/event/ngx_event_pipe.c:273
#4  ngx_event_pipe (p=p@entry=0xf92968, do_write=<optimized out>, do_write@entry=0) at src/event/ngx_event_pipe.c:49
#5  0x000000000049e4eb in ngx_http_upstream_process_upstream (r=0xf91180, u=0xf92520) at src/http/ngx_http_upstream.c:3924
#6  0x000000000049d4f2 in ngx_http_upstream_handler (ev=0xfa1e00) at src/http/ngx_http_upstream.c:1289
#7  0x000000000047662e in ngx_epoll_process_events (cycle=0xf26b10, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:902
#8  0x000000000046b9b2 in ngx_process_events_and_timers (cycle=cycle@entry=0xf26b10) at src/event/ngx_event.c:242
#9  0x0000000000474398 in ngx_worker_process_cycle (cycle=0xf26b10, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:750
#10 0x00000000004728f3 in ngx_spawn_process (cycle=cycle@entry=0xf26b10, proc=proc@entry=0x474283 <ngx_worker_process_cycle>, data=data@entry=0x0, name=name@entry=0x5b39de "worker process", respawn=respawn@entry=-3)
    at src/os/unix/ngx_process.c:199
#11 0x0000000000473456 in ngx_start_worker_processes (cycle=cycle@entry=0xf26b10, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:359
#12 0x0000000000474c40 in ngx_master_process_cycle (cycle=cycle@entry=0xf26b10) at src/os/unix/ngx_process_cycle.c:131
#13 0x000000000044b98c in main (argc=1, argv=<optimized out>) at src/core/nginx.c:382

The SIGSEGV is because path is NULL:

(gdb) frame 0
#0  ngx_create_temp_file (file=file@entry=0xf8f898, path=0x0, pool=0xf91130, persistent=0, clean=0, access=0) at src/core/ngx_file.c:159
159             name = path->name;

The path comes from tf->path which is NULL:

(gdb) frame 1
#1  0x00000000004548ab in ngx_write_chain_to_temp_file (tf=0xf8f898, chain=chain@entry=0xf8f9a0) at src/core/ngx_file.c:114
114             rc = ngx_create_temp_file(&tf->file, tf->path, tf->pool,
(gdb) print *tf
$1 = {file = {fd = -1, name = {len = 0, data = 0x0}, info = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, 
        tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}, offset = 0, sys_offset = 0, log = 0xf24900, valid_info = 0, directio = 0}, offset = 0, path = 0x0, 
  pool = 0xf91130, warn = 0x5ba390 "an upstream response is buffered to a temporary file", access = 0, log_level = 5, persistent = 0, clean = 0, thread_write = 0}

My location directive has proxy_temp_path set:

  location / {
    passenger_enabled on;
    proxy_temp_path /tmp/;
  }

Let me know if you need anything else.

CamJN commented 5 years ago

@kevgrig we haven't shipped an nginx 1.17. Can you describe your setup a bit more. This looks like it may be an upstream nginx bug after all.

kevgrig commented 5 years ago

@CamJN I'm on Fedora 30; Ruby 2.6.3; Rails 5.1.7; Passenger 6.0.2. I download nginx source from https://nginx.org/download/, and configure and compile with:

./configure --user=nginx --group=nginx --prefix=/opt/nginx/ \
        --with-http_ssl_module --with-pcre --with-http_gzip_static_module \
        --add-module=$(passenger-config --nginx-addon-dir) \
        --add-module=/usr/local/src/nginx-upload-module \
        --with-debug
make
sudo make install

Interestingly, the description of the issue implied this works with nginx < 1.15.8-8.6.0.2, so I tried nginx 1.15.7 and experience the same problem. I don't remember the last time I updated the passenger gem. I can confirm this used to work some months ago before I ran my general scripts to update everything.

kevgrig commented 5 years ago

I confirmed that the problem is resolved when I patched /usr/local/share/gems/gems/passenger-6.0.2/src/nginx_module/Configuration.c with the reverse of commit https://github.com/phusion/passenger/commit/7d4251bb4aec5cef6896f6d1905a30e99d21befe and recompiled nginx.

kevgrig commented 5 years ago

Related issue: #2075

CamJN commented 5 years ago

Thanks for the additional legwork. I'm really surprised that I can't trigger the issue given that reverting https://github.com/phusion/passenger/commit/7d4251bb4aec5cef6896f6d1905a30e99d21befe fixes it. Anyway this should be easy enough to fix then in the next Passenger release. I just hope I can avoid reintroducing #2075 but if not then this will definitely win, as that bug was just annoying whereas this one is breaking.

kevgrig commented 5 years ago

@CamJN Yeah, that's odd that you can't reproduce it. Maybe it's some sort of combination of network/CPU speed and disk speed where the client is able to read the file at a rate such that the nginx send buffers never hit their max (at which point nginx writes to the temp file)? You could try reducing the nginx send buffer sizes, and/or executing the client from somewhere with higher network latency.

nitrotm commented 5 years ago

We got the same issue with send_file and larger files (eg. 1MB). After investigation, even sending the data without using 'send_file' explicitly (for instance using render in rails) also triggers a crash but at a different location by looking at the IP.

What I found is that if I disable passenger buffering, it doesn't crash anymore. But of course that's a very temporary solution as it will impact performances.

passenger_buffer_response off;

Note: I am running Debian/stretch: nginx version is 1.10.3 and passenger is 6.0.2.

CamJN commented 5 years ago

Yeah, I'm going to update the title to reflect the actual issue. We know what the cause is now, and we have a solution. As I said above I hope I can avoid reintroducing #2075 but if not then this will definitely be fixed and that regressed, as that bug was just annoying whereas this one is breaking.

CamJN commented 5 years ago

6.0.3 released