phusion / passenger

A fast and robust web server and application server for Ruby, Python and Node.js
https://www.phusionpassenger.com/
MIT License
5k stars 547 forks source link

Constantly getting error: Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) - Apache stopped forwarding the backend's response ... #535

Closed FooBarWidget closed 10 years ago

FooBarWidget commented 10 years ago

From thomaswittold on December 09, 2009 17:12:29

I am using Passenger 2.2.7 with Apache/2.2.14 on an Gentoo Linux 2.6.21-xen x86_64 box. Apache is compiled with several modules, including SSL, and the xsendfile module ( http://tn123.ath.cx/mod_xsendfile/ ). I am delivering a Ruby On Rails Application using MRI 1.8.7 patchlevel 174.

Appearently the Rails Application delivers the pages correctly, but I keep on getting this Broken Pipe Error with every request in my error_log:

[ pid=15779 file=ext/apache2/Hooks.cpp:658 time=2009-12-09 17:08:03.124 ]: Apache stopped forwarding the backend's response, even though the HTTP client did not close the connection. Is this an Apache bug? * Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 17153): from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:112:in write' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:112:in process_request' from /usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/string_coercion.rb:10:in each' from /usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/response.rb:156:in each' from /usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.5/lib/action_controller/string_coercion.rb:9:in each' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/rack/request_handler.rb:111:in process_request' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_request_handler.rb:207:in main_loop' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:374:in start_request_handler' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:332:in handle_spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/utils.rb:184:in safe_fork' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:330:in handle_spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in send**' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in main_loop' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:196:in start_synchronously' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:163:in start' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/railz/application_spawner.rb:209:in start' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:262:in spawn_rails_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:126:in lookup_or_add' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:256:in spawn_rails_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:80:in synchronize' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server_collection.rb:79:in synchronize' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:255:in spawn_rails_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:154:in spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/spawn_manager.rb:287:in handle_spawn_application' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in __send__' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:352:in main_loop' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/lib/phusion_passenger/abstract_server.rb:196:in `start_synchronously' from /usr/lib64/ruby/gems/1.8/gems/passenger-2.2.7/bin/passenger-spawn-server:61

Original issue: http://code.google.com/p/phusion-passenger/issues/detail?id=435

FooBarWidget commented 10 years ago

From misi@planet-punk.de on January 01, 2010 06:20:29

Same here Debian 5, 2.6.26-2-amd64 Rails 2.3.5 Passenger 2.2.8 ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2009.10

FooBarWidget commented 10 years ago

From born70s on January 02, 2010 23:55:47

I'm seeing same error.

CentOS 5.2 x86_64 Rails 2.3.5 Passenger 2.2.8 ruby 1.9.1p376 (2009-12-07 revision 26041 ) [x86_64-linux]

Not sure if caused by this, I have to bounce Apache couple of times a day because the number of apache processes keeps increasing and eventually stops responding.

FooBarWidget commented 10 years ago

From honglilai on January 27, 2010 01:42:20

Does increasing the web server's maximum file descriptor limit work?

FooBarWidget commented 10 years ago

From jason.lapier on February 05, 2010 13:33:01

Also seeing this, and according to ulimit, I'm already at unlimited file descriptors.

Linux ubuntu 2.6.24-26-server #1 SMP Tue Dec 1 19:19:20 UTC 2009 i686 GNU/Linux Rails 2.3.5 Passenger 2.2.9 ruby 1.8.6 (2007-09-24 patchlevel 111) [i486-linux] Apache2-mpm-prefork package - version 2.2.8-1ubuntu0.14

FooBarWidget commented 10 years ago

From honglilai on February 07, 2010 03:20:43

I suspect that this problem might have something to do with a long-standing Safari bug: https://bugs.webkit.org/show_bug.cgi?id=5760 Could you disable keep-alive and check whether the problem still occurs?

FooBarWidget commented 10 years ago

From steve.quinlan on February 08, 2010 07:20:41

Readers of this comment may be interested in https://code.google.com/p/phusion-passenger/issues/detail id=378 and the work-arounds mentioned in it.

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 02:11:29

Issue 459 has been merged into this issue.

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 02:13:47

There are quite a lot of people reporting similar problems, though so far I've been completely unable to reproduce it locally. The observed behavior also doesn't make any sense, it's as if the kernel suddenly decided to close the socket between the different Passenger processes for no apparently good reason. I've posted a question on StackOverflow in the hope that someone else might know more about this: http://stackoverflow.com/questions/2235938/what-can-cause-an-spontaneous-epipe-error-without-either-end-calling-close-or-c

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 03:05:16

Also asked here: http://www.developerweb.net/forum/showthread.php?p=28779#post28779

FooBarWidget commented 10 years ago

From kent.thomas@medoraco.com on February 10, 2010 06:21:36

Would a dtruss of the passenger or the root apache process help at all?

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 07:01:53

You can give it a try. Please try to keep the data as small as possible.

If there's a way for me to reproduce the problem locally then that would be even better.

FooBarWidget commented 10 years ago

From kent.thomas@medoraco.com on February 10, 2010 07:56:05

The way I can reliably produce the errors is to do multiple refreshes of one page in the app. Simply hitting the keyboard shortcut (command+r for mac)(f5 for windows) about 20 to 30 times in a row will produce the error.

Just a curious thought, but is there something in the way that mod_proxy handles their requests to apache that dramatically differs from passenger?

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 08:33:42

Not that I know of.

FooBarWidget commented 10 years ago

From kent.thomas@medoraco.com on February 10, 2010 08:39:47

I should have specified in my last comment that hitting the keyboard shortcut for the refresh should be done in rapid succession.

FooBarWidget commented 10 years ago

From honglilai on February 10, 2010 13:26:52

Actually in your case that behavior is normal. If you refresh the browser too quickly, and the browser was sending a request at the time the refresh button was clicked, then the browser will abort that request and continue with the refresh, causing an EPIPE error message.

FooBarWidget commented 10 years ago

From kent.thomas@medoraco.com on February 10, 2010 13:36:04

Ok, I've been testing further. Here's another way I can reliably generate the errors. ab -n 200 -c 4 http://my- rails-site.com Of corse I test against my own rails app site. This particular test will generate at last 3 errors in the log file. Could we get a way to see some more debug logging from passenger and REE?

FooBarWidget commented 10 years ago

From honglilai on February 11, 2010 02:16:46

Issue 378 has been merged into this issue.

FooBarWidget commented 10 years ago

From honglilai on February 11, 2010 02:21:38

A summary of what has been found so far:

FooBarWidget commented 10 years ago

From honglilai on February 11, 2010 02:26:18

I have a patch which may help for problems that fall in the "rest" category: http://github.com/FooBarWidget/passenger/commit/c056b19d2cd8f754efe88aeb208dc54e15fe224b Does this work?

Please note that this patch will likely have no effect on OS X because of the kernel bug, which is not fixed until Passenger 3.

FooBarWidget commented 10 years ago

From steve.quinlan on February 11, 2010 05:05:53

Just confirming that disabling keep-alive in nginx seems to have solved my problem. App has been running 3 days without a problem (so far!). Thanks for the suggestion @honglilai and the work in dissecting this defect

FooBarWidget commented 10 years ago

From willieabrams on February 11, 2010 07:34:44

@honglilai Can you provide details on the OS X kernel bug? Any link or pointer would be helpful. We can file tech incidents with Apple to get it investigated.

Also, when will Passenger 3 be out? We run entirely on OS X and this bug is biting us all the time.

FooBarWidget commented 10 years ago

From honglilai on February 11, 2010 08:39:00

We hope to be able to post more about Passenger 3 in a month.

As for the OS X kernel bug, the problem occurs in the following setup:

  1. Given two processes, A and B, connected to each other via a Unix domain socket.
  2. Given a process C, which listens on either a Unix domain socket server or a TCP server.
  3. A sends a request to B, and as a result B connects to C.
  4. The client socket that B obtained is sent to A via Unix domain socket file descriptor passing.
  5. A uses this passed file descriptor to communicate with C. Let's call this file descriptor X. Most of the time this works, but sometimes a read() call to X returns 0, even though C did not close the connection. Retrying the same read() on X in a busy loop makes it work again later.

Furthermore:

FooBarWidget commented 10 years ago

From willieabrams on February 11, 2010 14:28:33

We see this particular error across our application server cluster hundreds to thousands times per day per server. If you need to see it at work, send email to my name here @gmail.com.

FooBarWidget commented 10 years ago

From honglilai on February 12, 2010 01:03:29

So your cluster is running on OS X? And just curious, which website is it?

FooBarWidget commented 10 years ago

From willieabrams on February 12, 2010 07:15:55

http://www.vitalsource.com/ http://store.vitalsource.com/ (and several other white label stores like textbooks.vitalsource.com) http://online.vitalsource.com/ (and several other white label versions of this Bookshelf client) notes.vitalbook.com (no UI, just services - heavily used sync server used by Bookshelf Mac and Windows clients)

Currently, we have 10 or so Xserves in production, we plan to scale back to 6 or so as some of our pilot project traffic gets more predictable. We run MySQL on OS X as well.

FooBarWidget commented 10 years ago

From willieabrams on February 12, 2010 07:18:30

I should add that store, online and notes run under passenger while www runs under mongrel still. We have some other less public sites on that stack as well running under passenger, too.

FooBarWidget commented 10 years ago

From honglilai on February 14, 2010 04:47:31

Phusion Passenger for Nginx does not suffer from the kernel bug because it uses different mechanisms. You can try switching to Nginx for now.

FooBarWidget commented 10 years ago

From vitalaaron on February 18, 2010 00:11:41

@honglilai

This reply is in reference to your question in issue 378 , which was merged with this issue.

For reference, my setup is Ubuntu 8.04, Passenger 2.2.5, Rails 2.1.2, nginx/0.7.61.

To answer your question from issue 378 , yes they are the same - multiple Application Spawners for the same app have been verified to be present at the same time as the problem.

To prevent this from occurring I've been running this every hour:

kill ps -eo pid,args | grep "Passenger spawn server" | grep -v grep | awk '{print $1}' touch /opt/nginx/html/mx/tmp/restart.txt

This seemed to be working well, but sometime in the last hour the problem started again ("upstream prematurely closed connection while reading response header from upstream" for every single request). Running the above script manually (vs. cron) fixed the problem.

FooBarWidget commented 10 years ago

From steve.quinlan on February 18, 2010 01:13:35

@vitalaaron have you tried setting keep-alive to 0 in nginx.conf? Doing so solved my problem. If you try it then kill nginx and restart it rather than an nginx reload or passenger restart.

FooBarWidget commented 10 years ago

From vitalaaron on February 18, 2010 10:34:41

@steve.quinlan I hadn't tried altering the keep-alive setting, but I just made change. Hopefully this will take care of the problem, at least until the ultimate cause is determined. Thanks.

FooBarWidget commented 10 years ago

From vitalaaron on February 19, 2010 22:14:18

@steve.quinlan & @honglilai

I set keep-alive to 0 and killed/started nginx the other day as you stated. The problem, however, returned tonight (every request was returning the previously mentioned error until the cron job ran to remedy the problem).

Unfortunately, this is a high-traffic website monetized solely by advertisers and I cannot continue to experiment to work around this issue (can't risk more downtime and upset users). Until this is resolved, I'm going to have to move back to a Thin setup. Sorry that I cannot be of any more help :(

FooBarWidget commented 10 years ago

From honglilai on February 21, 2010 07:54:48

Issue 461 has been merged into this issue.

FooBarWidget commented 10 years ago

From naimlissone on February 24, 2010 09:19:39

Are we any closer to finding a solution to this issue?

FooBarWidget commented 10 years ago

From vitalaaron on February 25, 2010 03:17:09

FYI - I am still running Passenger on my development server and got the same error with the both latest version of Nginx (0.7.64) and Passenger (2.2.10) installed.

FooBarWidget commented 10 years ago

From pierre.y on March 03, 2010 00:29:51

Debian Lenny / 2.6.26-2-amd64 Apache 2.2.9-10+lenny6 / Timeout 1200 Ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux] Passenger 2.2.5 and 2.2.10 Rails 2.3.4 and 2.3.5

Something interresting : with passenger 2.2.5 the website is still alive while with passenger 2.2.10 the website becomes unreachable every 5 minutes and I have to restart Apache.

FooBarWidget commented 10 years ago

From honglilai on March 03, 2010 02:03:58

pierre.y: that's a regression in 2.2.10: http://groups.google.com/group/phusion- passenger/t/d5bb2f17c8446ea0?hl=en It's got nothing to do with this issue. I've already posted a patch which needs confirmation.

FooBarWidget commented 10 years ago

From kivanio on March 10, 2010 04:58:03

i'm having this issue in 2.2.11:

pid=25077 file=ext/apache2/Hooks.cpp:656 time=2010-03-10 08:36:41.715 ]: Either the vistor clicked on the 'Stop' button in the web browser, or the visitor's connection has stalled and couldn't receive the data that Apache is sending to it. As a result, you will probably see a 'Broken Pipe' error in this log file. Please ignore it, this is normal. You might also want to increase Apache's TimeOut configuration option if you experience this problem often. *\ Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 24433): from /usr/lib/ruby/gems/1.8/gems/passenger- 2.2.11/lib/phusion_passenger/rack/request_handler.rb:109:in write' from /usr/lib/ruby/gems/1.8/gems/passenger- 2.2.11/lib/phusion_passenger/rack/request_handler.rb:109:inprocess_request' from /usr/lib/ruby/gems/1.8/gems/actionpack-2.3.4/lib/action_controller/response.rb:155:in `each'

Opensuse 11.2 Server version: Apache/2.2.13 (Linux/SUSE)

FooBarWidget commented 10 years ago

From cyn0nrautha on April 08, 2010 10:12:13

I'm also having this issue in 2.2.11 and 2.2.10 -- I know I can safely rollback to 2.2.2 without having this issue... I see it at startup and for a while everything seems fine until 3-4 hours later and I come back to see 40-50+ apache processes and at that point the site is not responding

my ulimit is default of 1024 but I see no reason why each apache proc. would need more than that

FooBarWidget commented 10 years ago

From christophe.lucas on May 03, 2010 10:47:50

I am getting the same errors on Centos, Nginx 0.8.36, Ruby Enterprise Edition and passenger 2.2.11. I only see it when I use an ELB in front of Centos instances and when passenger starts queueing.

FooBarWidget commented 10 years ago

From bogdan.ionescu on July 22, 2010 17:28:37

I have been getting this error for at least one year, on various versions of nginx and passenger. It happens on Centos and it happens on Fedora. At the moment I am using passenger 2.2.15 and nginx 0.7.67 I am just mystified how a problem that has been reported for more than 18 months has still not been reproduced, since to me it is shocking at the moment that some people do not have it.

FooBarWidget commented 10 years ago

From steve.quinlan on July 22, 2010 23:20:26

@bogdan.ionescu - I temporarily changed my apps to use nginx + thin. I think Passenger 3 once it comes out will solve the problem.

FooBarWidget commented 10 years ago

From bogdan.ionescu on July 23, 2010 06:27:59

@steve.quinlan thanks for your suggestion, I'm giving thin a try since the stability of the application is more important than the elusive memory footprint or autospawning advantages.

FooBarWidget commented 10 years ago

From sahil.cooner on September 09, 2010 13:48:30

Running the following ruby RubyGems Environment:

and also running into the same issue. I haven't really dug into this yet, if anyone found a solution would love to hear it. If not I'll post back with any new findings :).

FooBarWidget commented 10 years ago

From sahil.cooner on September 10, 2010 00:09:29

The resolution is to turn of the KeepAlive On => Off and you won't experience the crash. This rectifies the issue with passenger dying, but is still a bug.

FooBarWidget commented 10 years ago

From daniel.thor on September 10, 2010 01:01:33

Changing the KeepAlive directive never worked for us. After several days of tinkering we have stopped trying to figure this one out and we've backed down to mongrel cluster again pending the release of Passenger 3.

Daniel

FooBarWidget commented 10 years ago

From honglilai on September 15, 2010 11:59:47

Passenger 3 should fix this. I'm closing the bug for now. Feel free to comment or open a new issue if anyone still has problems.

Status: Fixed
Labels: Milestone-3.0.0