perusio / drupal-with-nginx

Running Drupal using nginx: an idiosyncratically crafted bleeding edge configuration.
854 stars 246 forks source link

php-fpm crashing every few day #55

Open ghost opened 12 years ago

ghost commented 12 years ago

Good Morning,

I understand this is probably not related to this config, however you seem to be a wealthy source of information on nginx / php-fpm and drupal. Perhaps you have seen this problem before...

Every few days at exactly the same times 08:04, 20:04 (approx.) The php-fpm children of all of my web servers increase dramatically until the max_children is reached, the servers then come under heavy load and syslog show kernel messages related to oomkiller. php-fpm then appears to crash/restart all processes and the site then loads fine again for another few days.

This cycle repeats every few days at the same time, it also occurs whether I set max_children to 10, 50, 100 etc. what ever the value, they spawn more until php-fpm crashes, some kind of memory leak or infinite loop?

Site traffic is also minimal at these times and the site can handle 200% more traffic at other peak times without problems.

Versions: Ubuntu 11.10 - 3.0.0-12-server x86_64 nginx 1.0.5-1 php 5.3 5.3.6-13ubuntu3.3 drupal6

Any ideas would be well appreciated.

Regards,

Alun.

perusio commented 12 years ago

OOM killer is a kernel process that recovers memory that is thought to be the result of a runout process.

Check this: http://lwn.net/Articles/317814/

This could be a php-fpm config issue or a bug.

fidelix commented 12 years ago

Try to edit your upstream and remove keepalive. Edit your nginx servers and comment fastcgi_keep_conn.

This fixed it for me.

perusio commented 12 years ago

You might want to look here. This is probably due to the high number of cached connections for the FCGI upstream keepalive. Lower the value to 1 and see how it goes.

ghost commented 12 years ago

I have checked my nginx setup and fastcgi_keep_conn is hashed out :(

fastcgi_keep_conn on;

Would it still be worth changing #keepalive 5; to keepalive 1; ??

perusio commented 12 years ago

I think that there's something in your site that triggers that. Have you checked the logs (php-fpm logs I mean)?

moehac commented 12 years ago

Hi there, thanks again for helping me on the other issue. I believe I'm having the same problem. I have one pool and have my keepalive set to 1. I can run your config with no issues running on port 80, but if I start Varnish and have Nginx listen on a backport, it starts spawning a ton of pool processes. I changed the keepalive to 1 and commented out fastcgi_keep_conn, but it still spawns a lot of pools and starts to give me errors of connection reset by peers.

Any experience with Varnish in front of your configuration? Is it even worth it to have Varnish?

ghost commented 12 years ago

Intresting! I am also using Varnish in front of Nginx!

moehac commented 12 years ago

Are you having the same issue?

Sent from my iPhone

On Sep 7, 2012, at 2:51 AM, paradoxni notifications@github.com wrote:

Intresting! I am also using Varnish in front of Nginx!

— Reply to this email directly or view it on GitHub.

perusio commented 12 years ago

I can't help you there. What I can help you is to get Nginx working as a LB/reverse proxy replacing Varnish, with advantages IMHO. Unless you use very abstruse stuff like compressed ESI.

moehac commented 12 years ago

I certainly would appreciate that. How would this setup have advantages over varnish IYHO?

On Sep 7, 2012, at 10:42 AM, António P. P. Almeida notifications@github.com wrote:

I can't help you there. What I can help you is to get Nginx working as a LB/reverse proxy replacing Varnish, with advantages IMHO. Unless you use very abstruse stuff like compressed ESI.

— Reply to this email directly or view it on GitHub.

priyadarshan commented 12 years ago

I would also be very interested to learn.

perusio commented 12 years ago

Ok. Post your VCL file somewhere, so that I can suggest a reverse proxy config for Nginx.

HMoen commented 12 years ago

Here's my default.vcl contents:

backend default { .host = "127.0.0.1"; .port = "8001"; .connect_timeout = 600s; .first_byte_timeout = 300s; .between_bytes_timeout = 10s; }

sub vcl_recv { if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ return (pipe); }

if (req.http.X-Forwarded-For) { // Append the client IP set req.http.X-Real-Forwarded-For = req.http.X-Forwarded-For + ", " + regsub(client.ip, ":.", ""); unset req.http.X-Forwarded-For; } else { // Simply use the client IP set req.http.X-Real-Forwarded-For = regsub(client.ip, ":.", ""); }

if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ return (pass); }

// Remove hasjs and Google Analytics cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s)([a-z]+|utma_a2a|hasjs)=[^;]", "");

// To users: if you have additional cookies being set by your system (e.g. // from a javascript analytics file or similar) you will need to add VCL // at this point to strip these cookies from the req object, otherwise // Varnish will not cache the response. This is safe for cookies that your // backend (Drupal) doesn't process. // // Again, the common example is an analytics or other Javascript add-on. // You should do this here, before the other cookie stuff, or by adding // to the regular-expression above.

if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|js|css)\??.?$") { unset req.http.Cookie; }

// Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s", ""); // Remove empty cookies. if (req.http.Cookie ~ "^\s$") { unset req.http.Cookie; }

if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); }

// Skip the Varnish cache for install, update, and cron if (req.url ~ "install.php|update.php|cron.php|sitemap.xml|robots.txt|phpmyadmin") { return (pass); }

// Normalize the Accept-Encoding header // as per: http://varnish-cache.org/wiki/FAQ/Compression if (req.http.Accept-Encoding) { if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {

No point in compressing these

  remove req.http.Accept-Encoding;
}
elsif (req.http.Accept-Encoding ~ "gzip") {
  set req.http.Accept-Encoding = "gzip";
}
else {
  # Unknown or deflate algorithm
  remove req.http.Accept-Encoding;
}

}

// Let's have a little grace set req.grace = 10s;

return (lookup); }

sub vcl_hash {

if (req.http.Cookie) { hash_data(req.http.Cookie); } }

// Strip any cookies before an image/js/css is inserted into cache. sub vcl_fetch {

if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|js|css)\??.?$") { set beresp.ttl = 7200s; set beresp.grace = 10m; set beresp.http.expires = beresp.ttl; set beresp.http.age = "0"; unset beresp.http.set-cookie; }

}

sub vcl_deliver { if(obj.hits > 0){ set resp.http.X-Cache = "HIT"; set resp.http.X-Cache-Hits = obj.hits; } else { set resp.http.X-Cache = "MISS"; }

return (deliver);

}

sub vcl_error { // Let's deliver a friendlier error page. // You can customize this as you wish. set obj.http.Content-Type = "text/html; charset=utf-8"; synthetic {" <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

"} + obj.status + " " + obj.response + {"

Page Could Not Be Loaded

We're very sorry, but the page could not be loaded properly. This should be fixed very soon, and we apologize for any inconvenience.


Debug Info:

Status: "} + obj.status + {"
Response: "} + obj.response + {"
XID: "} + req.xid + {"

"}; return(deliver); }

perusio commented 12 years ago

I'm a hardly a Varnish connaisseur. But most of this stuff is already done by the Nginx config (80% out of the box):

  1. No compression of images.
  2. HEAD and GET are the only allowed methods in proxy_cache.
  3. Sending the real IP in a header.

You're caching static assets for 2h (7200s) right?

moehac commented 12 years ago

Yes, due to being a development box. On production it would need max settings.

Does nginx store static files in ram like Varnish?

Sent from my iPhone

On Sep 11, 2012, at 7:09 AM, António P. P. Almeida notifications@github.com wrote:

I'm a hardly a Varnish connaisseur. But most of this stuff is already done by Nginx config:

No compression of images.

HEAD and GET are the only allowed methods in proxy_cache.

Sending the real IP. You're caching static assets for 2h (7200s) right?

— Reply to this email directly or view it on GitHub.

perusio commented 12 years ago

No, you don't need to. It's quite optimized in terms of IO. If you want it you can put it on a tmpfs partition.

perusio commented 12 years ago

BTW, are you receptive to testing the two setups and benchmark each approach?

ghost commented 12 years ago

I may be able to test also, varnish config is very similar to mine, thou we are using varnish for load-balancing the web servers too, but I am sure its easy enough to do with nginx also.

moehac commented 12 years ago

Absolutely.

Sent from my iPhone

On Sep 11, 2012, at 7:30 AM, António P. P. Almeida notifications@github.com wrote:

BTW, are you receptive to testing the two setups and benchmark each approach?

— Reply to this email directly or view it on GitHub.

perusio commented 12 years ago

Ok. This evening (I'm on CET ) I'll create a branch varnish-cracker and push the first attempt at getting it right.

HMoen commented 12 years ago

Perfect.

Sent from my iPhone

On Sep 11, 2012, at 8:18 AM, "António P. P. Almeida" notifications@github.com<mailto:notifications@github.com> wrote:

Ok. This evening (I'm on CET ) I'll create a branch varnish-cracker and push the first attempt at getting it right.

— Reply to this email directly or view it on GitHubhttps://github.com/perusio/drupal-with-nginx/issues/55#issuecomment-8457644.

fidelix commented 12 years ago

This will be extremely interesting. I have a project that I will be migrating to a new server, and if this works out, I will gladly replace Varnish.

HMoen commented 12 years ago

I will be setting up a production environment soon and how this works out will dictate how I'll go forward as well.

On Sep 11, 2012, at 10:37 AM, Felipe Fidelix wrote:

This will be extremely interesting. I have a project that I will be migrating to a new server, and if this works out, I will gladly replace Varnish.

— Reply to this email directly or view it on GitHubhttps://github.com/perusio/drupal-with-nginx/issues/55#issuecomment-8461904.

perusio commented 12 years ago

Ok the first commit is in. I have to test it. Usually I set up different upstreams for videos, css/js, images, etc. This is a simplification. It's the load balancer (or mere reverse proxy if you have a single upstream) that caches all the stuff. For the moment there's caching of CSS/JS and images.

Missing is testing and setting up the real IP header as well verifying all other headers. Note that now the server that forwards to fpm runs binded to the loopback as a security measure. It runs on port 8081. If you're not load balancing just delete all superfluous upstreams in backends_web.conf.

Here's the branch: https://github.com/perusio/drupal-with-nginx/tree/varnish-cracker

Hopefully I'll finish it tomorrow. You can try it if you feel adventurous. Take a peek on it to see where things are going.

perusio commented 12 years ago

A little bit more patience :( I hope to a have a mo' better Varnish without Varnish by the weekend. Let's see.

HMoen commented 12 years ago

Much appreciated!!

Sent from my iPhone

On Sep 12, 2012, at 7:57 PM, "António P. P. Almeida" notifications@github.com<mailto:notifications@github.com> wrote:

A little bit more patience :( I hope to a have a mo' better Varnish without Varnish by the weekend. Let's see.

— Reply to this email directly or view it on GitHubhttps://github.com/perusio/drupal-with-nginx/issues/55#issuecomment-8514285.

perusio commented 12 years ago

It's in the first working version. There are several ways to do "Varnish" in Nginx. I've choosen to front a site with a proxy that intercepts all static calls and uses a cache. It also uses a proxy cache, thus allowing for two levels of caching. You can just comment out the include sites-available/microcache_long_proxy.conf; line on the / and /imagecache/ locations if you don't want it. In fact in that case it's simpler to run a dedicated server just for static assets to which the main server proxies to. The setup now available it's mostly useful to people that need the load balancing part. No touching/obeying of headers from the upstream is in place. Do tel me if you need such a thing.

Note that the logic of having a proxy cache + fastcgi cache is to chain the caches, this is most useful in setups with load balancing. One longer cache on the proxy and a shorter one on the fastcgi.

proxy_cache(T) -> fastcgi_cache(t) where T > t being the respective cache validities. 

Try it out and report back.

perusio commented 12 years ago

BTW I've added ETag support. It requires Nginx >= 1.3.3.

perusio commented 12 years ago

@Fidelix @paradoxni @hmoen @moehac @priyadarshan Anyone ventured with this? I.e., using Nginx in lieu of Varnish? Feedback is needed to progress :) Thanks.

moehac commented 12 years ago

I will be testing this weekend. Trying to put out fires on a project :)

Sent from my iPhone

On Sep 21, 2012, at 6:55 AM, António P. P. Almeida notifications@github.com wrote:

@Fidelix @paradoxni @hmoen @moehac @priyadarshan Anyone ventured with this? I.e., using Nginx in lieu of Varnish? Feedback is needed to progress :) Thanks.

— Reply to this email directly or view it on GitHub.

priyadarshan commented 12 years ago

I am in the process of a big move, but this is very high on our priority list. I hope I shall be able to test it thoroughly by mid-october. I shall report back here, hoping it will still be useful by then.

In any case, I am very, very grateful for drupal-with-nginx, it is a wonderful work. Thank you!

fidelix commented 11 years ago

Perusio, how do you recommend dealing with cache invalidation from Drupal? With Varnish, there is the varnish module, that connects to the varnish console via a secret key and issues purge commands.

perusio commented 11 years ago

Well, you could install purge and expire and have it work with the nginx 3rd party purge module. I find it over complicated. I have plans to write a Lua script that it will handle the cache purging in a efficient way. The nginx purge module doesn't accept wildcards.

I have a small script that can clear the cache. Is inefficient (it's based on grep): https://github.com/perusio/nginx-cache-purge so for very large caches its slow. Integrating it with expire is not that difficult. You could fork the process to not block when purging the cache.

If there's some kind soul willing to sponsor the development of a real Nginx cache handler and integrate with drupal, then things will go faster ;)

fidelix commented 11 years ago

António, would you create an entire nginx module or build upon something else?

How much do you think it would take?

We could try to raise the money from the community. A decent cache handler is the only thing that really prevents people from completely replacing Varnish with Nginx, no?

perusio commented 11 years ago

There's the possibility of re-using the Nginx cache, but since it's file based that complicates the setup to make it performant. My idea is to replace the Nginx cache by Redis and then all becomes centralized. This would make Nginx caching more versatile than Varnish IMO. We could envisage two interfaces one using HTTP similar to the purge module, but without the need for a different instance, and a PHP layer using a Redis driver.

perusio commented 11 years ago

The cache would be based around Nginx embedded Lua module close integration with the Nginx API. You have both the headers and the response body. It would'n't be much different in terms of configuring than the regular Nginx cache. It would have a Lua module/library that implements the cache. Then it's just another cache that you can deal with as if were Varnish (or better :)

perusio commented 11 years ago

Getting back to the Nginx replacement of Varnish. I've just pushed an update with X-Real-IP support and corrected some errors.

perusio commented 11 years ago

@Fidelix in fact I forgot that I already gave some thought to that and in fact you can get by without a "new" cache. Just revisited this thread on g.d.o.

http://groups.drupal.org/node/210328

fidelix commented 11 years ago

@perusio, I remember that thread. So basically it would be a crawler inside drupal that sends unauthenticated requests and updates the cache entries? How do you handle wildcard bypassing in a situation like that?

perusio commented 11 years ago

I would do parallel requests. This needs to be integrated with the drupal API so that we can get a list of valid URIs. That's the hardest part.

fidelix commented 11 years ago

hmm... http://drupal.org/project/httprl seems like the right base for the job.

jimyhuang commented 10 years ago

sorry, could I ask is fastcgi_keep_conn relate to the cache?

tdm4 commented 9 years ago

Using fastcgi_keep_conn on; in Nginx and using PHP-FPM in 'ondemand' or 'dynamic' configuration is very bad. Once php-fpm terminates the php-fpm child, nginx still thinks it is connected to it and after a while php-fpm will just keep forking new children, hit the limit, and fall over.

I have seen this on Drupal sites and non-Drupal sites (like Magento). In the drupal config I noticed this directive is on. This can pose major problems. It took make a long long time to figure out that this setting was a problem. I recommend having it turned off in the config, always. The supposed performance benefit isn't worth the grief of wondering why 20-30 php-fpm children died with SIGTERM 15.. and googling for the answer gives hardly any answers. I had to track the problem down to here:

http://mailman.nginx.org/pipermail/nginx/2013-February/037496.html https://bugs.php.net/bug.php?id=60961 https://bugs.php.net/bug.php?id=63395

steventhabel commented 9 years ago

tdm4, you are the only person that has really explained this the right way for me.

I have always struggled with using "fastcgi_keep_conn on;" in the perusio config. I'm surprised more people don't complain of the errors. Once I comment it out, everything works fine.

btw i am using 2 nginx instances (1 reverse proxy for ssl termination in front and 1 web server for drupal environment in back).

today, i tried to use the drupal.conf and tried using the microcache proxy (sounded like an awesome idea!), and it always returns "nginx: [emerg] the shared memory zone "microcache" is already declared for a different use in /etc/nginx/apps/drupal/microcache_proxy_auth.conf:7".

will investigate a little more!

perusio commented 8 years ago

@tdm4 I agree that for on demand the keepalive setup can pose problems, but not for dynamic. Per the links you posted to the php-fpm bug reports, only and only if you have a number of connections in the connection pool greater than the minimum number of servers. Of course if you have 4 as the minimum number of servers and you have keepalive 5 in the upstream definition this may pose problems. I will add something to the README and also comments on the files to warn about that.

So bottom line:

perusio commented 8 years ago

@cthshabel You cannot declare memory zone twice. You're including the cache zone definition twice. Comment out the inclusion of the file in the second vhost I'm sure you mean vhost and not instance. A second instance means a different master process and another PID for it. If it was really a second instance it wouldn't complain. What do you mean by second instance?

tdm4 commented 8 years ago

@perusio Re: Your comments above. With dynamic, the keepalive must be set to a number less than or equal to the minimum servers in php-fpm. Is that pm.start_servers or pm.min_spare_servers? Also, fastcgi_keep_conn should be set to on with php-fpm dynamic but not for ondemand?

What happens when one of the minimum servers nginx is expecting to talk to decides to close (presumably after pm.max_requests and a new one spins up?)

We've had keepalive not set, but fastcgi_keep_conn on; and it basically blows up php-fpm after a certain amount of time. So perhaps having keepalive not set and fastcgi_keep_conn on; messes things up as well?

cjbirk commented 8 years ago

Hello. I just ran into this issue recently with a client. It appears that they installed (but did not properly configure) Apache Solr - some sort of search platform.

I figured that I would put this here just in case anyone else runs into this issue. Basically this boils down to PHP error logging being extremely terrible. I hope someone finds this useful.

UPDATE: I should mention a solution: 1) Login to your MySQL database 2) USE yourDrupalDatabase; 3) select count(*) from search_api_task; 4) If the above shows a ridiculously high number go to step 5 5) TRUNCATE TABLE search_api_task;

Should fix your issue.

Sorry in advance if this is unrelated to whatever your problem is.

pprishchepa commented 7 years ago

@paradoxni try to disable pm.max_requests in PHP-FPM config (if enabled).