owncloud / core

:cloud: ownCloud web server core (Files, DAV, etc.)
https://owncloud.com
GNU Affero General Public License v3.0
8.35k stars 2.06k forks source link

nginx + php5-fpm results in random 502 Bad Gateway #14187

Closed laurivosandi closed 9 years ago

laurivosandi commented 9 years ago

Hello,

we just upgraded several machines to OwnCloud 8 and the result is that php5-fpm occasionally goes nuts and stops responding to nginx. There is nothing interesting in logs and we're running on Ubuntu 14.04 with default nginx and php5-fpm packages.

goodkiller commented 9 years ago

Same problem here. I'll provide some logs that I found:

2015/02/13 07:56:42 [error] 6403#0: 12517 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: x.x.x.x, server: my.name.xx, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "my.name.xx" 2015/02/13 07:56:42 [error] 6403#0: 12503 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: x.x.x.x, server: my.name.xx, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "my.name.xx"

2015/02/13 07:56:42 [crit] 6404#0: 3845 connect() to unix:/var/run/php5-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: x.x.x.x, server: my.name.xx, request: "GET /status.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "my.name.xx" 2015/02/13 07:56:42 [crit] 6404#0: 3832 connect() to unix:/var/run/php5-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: x.x.x.x, server: my.name.xx, request: "GET /status.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "my.name.xx"

also there are

2015/02/13 09:03:39 [warn] 11283#0: *27 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000001, client: x.x.x.x, server: my.name.xx, request: "POST /index.php/apps/files/ajax/upload.php HTTP/1.1", host: "my.name.xx", referrer: "https://my.name.xx/index.php/apps/files/?dir=%2FMyDirectory"

Those problems appears after Owncloud upgrade to 8, before nginx and php-fpm works well. Please take this as a critical issue.

rockihack commented 9 years ago

The socket file (unix:/var/run/php5-fpm.sock) is missing. I don't have a clue what caused this issue (oc8 upgrade?), but try to use ip:

Open your fastcgi pool config: vim /etc/php5/fpm/pool.d/www.conf

Change listen to: listen = 127.0.0.1:9000

Open your nginx site config: vim /etc/nginx/sites-available/owncloud

Comment out unix:/var/run/php5-fpm.sock and use: server 127.0.0.1:9000;

Don't forget to restart nginx and php5-fpm.

goodkiller commented 9 years ago

Usin TCP socets are slightly slower performance then using sockets. This problem wasn't occurred in OC7 this appears after OC8 upgrade.

wanno-drijfhout commented 9 years ago

I have the same problem but haven't tried the TCP-workaround. Contrary to @rockihack's assertion, the socket-file does not miss on my system when the 502 errors occur.

root:~# ls -las /var/run/php5-fpm*
4 -rw-r--r-- 1 root     root     4 Feb 14 11:30 /var/run/php5-fpm.pid
0 srw-rw---- 1 www-data www-data 0 Feb 14 11:30 /var/run/php5-fpm.sock

I think there might be something wrong with memcache/apc—I have not consciously enabled any sort of caching mechanism on my web service, by the way.

root:~# tail /var/log/php5-fpm/www.log.slow

[14-Feb-2015 18:32:03]  [pool www] pid 6477
script_filename = /var/www/owncloud/status.php
[0x00007ff5e7d64c30] apc_store() /var/www/owncloud/lib/private/memcache/apc.php:21
[0x00007ff5e7d64af0] set() /var/www/owncloud/lib/autoloader.php:109
[0x00007fff7a419380] load() unknown:0
[0x00007fff7a4196d0] spl_autoload_call() unknown:0
[0x00007ff5e7d648a0] init() /var/www/owncloud/lib/base.php:519
[0x00007ff5e7d64798] init() /var/www/owncloud/lib/base.php:1007
[0x00007ff5e7d64680] +++ dump failed
karlitschek commented 9 years ago

hmmm. interesting case. but to me this looks clearly like a problem in your os,webserver,php setup and not like an owncloud bug.

wanno-drijfhout commented 9 years ago

@karlitschek I hope you pressed that "Close ticket" button by accident. Apparently, you know something to point to the cause of this problem and are currently preparing an post detailing your findings. Otherwise, please reopen the ticket.

The upgrade to ownCloud 8 is the single change in our "os,webserver,php setup" that causes this problem to emerge. Undeniably, ownCloud is a trigger and that begs for investigation in collaboration with ownCloud-devs. After investigation, we'll be able to make statements about the cause and everyone will be happy.

The "apc_store"-function (in ownCloud code!) is the one that consistently shows up in my slow-log (as I quoted). Thus, it seems that this function call causes php-fpm to deadlock/freeze.

For now, I have bluntly disabled APCU in my PHP-FPM-configuration:

sudo -s
cd /etc/php5/fpm
mv conf.d/20-apcu.ini conf.d--20-apcu.ini
service php5-fpm restart

We'll know in a day if this works; the problem emerges quite regularly approximately every 12 hours.

karlitschek commented 9 years ago

ok. i reopen this. but this still looks like a configuration issue. of course owncloud uses apc calls. but they should work fine if the environment is configured correctly. let's see what the outcome of your debugging will be.

goodkiller commented 9 years ago

I will try to do some more debugging. But as I said earlier, this problem comes out after OC8 upgrade, so I assume there can not be problem in webserver conf...

rockihack commented 9 years ago

I'm running owncloud8 (fresh install) with nginx, php5-fpm and postgresql just fine...

goodkiller commented 9 years ago

And you are using only TCP PHP connections intead of sockets? Can you post your nginx and php-fpm conf if possible, thanks a lot!

rockihack commented 9 years ago

Nope it runs with unix socket and I forgot to mention that it uses APCu. I suggested to try ip because of the errors you posted above ("connect() to unix:/var/run/php5-fpm.sock failed (2: No such file or directory) while connecting to upstream...").

cyking commented 9 years ago

I upgraded from 7.x to 8 now both my installations are getting 502 errors ever couple hours. Only thing changed was OC. Running Ubuntu 14.04 with nginx.

laurivosandi commented 9 years ago

Hello, switching from UNIX sockets to TCP socket seems to fix the issue but that's not a real solution to the problem as UNIX socket is the preferred way for communications in this case (nginx and php5-fpm in same machine).

wanno-drijfhout commented 9 years ago

My earlier attempt (of butchering APCU) has not worked. I woke up this morning with error messages from clients that couldn't connect to ownCloud due to a 502 Bad Gateway-error.

The nginx-log shows:

2015/02/15 09:37:07 [error] 16025#0: *25393 connect() to unix:/var/run/php5-fpm.sock failed (111: Connection refused) while connecting to upstream, client: xxxxxxxxxxxx, server: xxxxxxxxxxxxxxxxxx, request: "GET /status.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "xxxxxxxxxxx"
root@:~# service php5-fpm status
php5-fpm stop/waiting

The contents of /var/log/php5-fpm.log surprise me as well:

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

[15-Feb-2015 06:27:32] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful

In other words, some massive php-fpm crash/restart seems to have occurred around 06:27 (timing looks consistent with my earlier findings).

My hypothesis is the regular respawning doesn't work: I have modified /etc/php5/fpm/pool.d/www.conf. In my set-up pm.max_requests was 50, now it's 0.

; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
; Default Value: 0
;pm.max_requests = 0

We'll see if this works in a day or less.

P.S. @karlitschek I notice this ticket has the 'closed' state, still.

goodkiller commented 9 years ago

I tried also to set pm.max_requests = 0 but still nothing...

wanno-drijfhout commented 9 years ago

Setting pm.max_requests = 0 seems to work for me (at the moment). I have not had any errors since my previous post. We'll see if it persists. If this works, it surely is a weird workaround.

laurivosandi commented 9 years ago

Hello, I just discovered that on Apache similar issues happen. Web server just stops responding.

wanno-drijfhout commented 9 years ago

I am getting "Gateway Time-out" errors today when updating my calendar via CalDAV. This is truly odd.

I wonder if there's a memory leak somewhere. Suppose some calendar update is requested by the client. ownCloud is using all sorts of resources to process the request but does not close them.

With pm.max_requests = 50, the "unclosed resources" become a problem when restarting php-fpm workers. Possibly some resources are locked and new workers cannot spawn, resulting in 502 Bad Gateway exceptions.

With pm.max_requests = 0, the "unclosed resources" become a problem when the forever-recycled php-fpm workers patiently wait for the resources to be unlocked/released. They (of course) can wait for infinity, but nginx prefers to call it a day after some timeout, resulting in 504 Gateway Timeout exceptions.

@karlitschek Why is this ticket still closed? We need a developer in here.

cyking commented 9 years ago

:+1: still not working for me.

I am not familiar with the polite practice of how to request attention to this so I made a new issue... best intentions

https://github.com/owncloud/core/issues/14360

wanno-drijfhout commented 9 years ago

Apparently, a new ownCloud package for my Ubuntu server was released in the last few days. New and old version seem to be equal, though (8.0.0-5). We'll see if this makes any difference in a day or so. Today, before the update, I was still dealing with timeouts.

@cyking Good call.

dirkhusemann commented 9 years ago

Tried pretty much everything described here, still getting 502 after a couple of hours (with "a couple of Hours" varying from 2 to almost 12).

Please reopen! This bug renders owncloud unusable.

s1lvester commented 9 years ago

I can confirm this.

after a few hours of running oc 8 and syncing files php5-fpm times out when using sockerts (php5-fpm.sock) on ubuntu 14.04

sudo service php5-fpm restart fixes the problem.

my other oc8 installation on debian stable works fine. so maybe its a problem with ubuntu-packages (either oc oder php5-fpm)...

cyking commented 9 years ago

I installed via the web installer as well as manually via tar with same results on ubuntu 14.04. I have never used the ubuntu deb package.

StormCh commented 9 years ago

Hi, it seems to be a problem caused by the owncloud client software. After updating to oc8 the clients ask/request the files stored on the server very often (with errors in my case). This tooks me 4 days to find out. After the owncloud client software only "synchronize" new and updated files - the problem was gone.

wanno-drijfhout commented 9 years ago

@StormCh Can you elaborate what you mean with "only synchronize new and updated files"?

I just downloaded and upgraded to the dev-version (ownCloud-1.8.0.4730beta1-setup.exe) of the ownCloud client. The problem persists.

My local copies of ownCloud folders (I use a dozen separate folders) had been in sync since long before upgrading the server to ownCloud 8. I do notice something related, though. When I start up the Windows ownCloud client, about 4 sync folders are marked as "up-to-date" while the rest is waiting in a "pending synchronisation" state. Those eventually time-out. The ownCloud client user interface is very sluggish during this wait.

Eventually, the ownCloud client establishes synchronization parity and all folders are marked "green" (perhaps due to my cron-jobbed service php5-fpm restart). The UI remains sluggish for some time still, though. Opening the window doesn't work or takes multiple seconds. Switching tabs takes seconds. A few minutes later the UI seems to become properly responsive again.

I'll make a ticket in the mirall bug tracker referencing this one to ask for their insights.

StormCh commented 9 years ago

@Xsysstar "only synchronize new and updated files" - means: normal synchronisation. So the ownCloud Client Software only need to sync new and updated files. All other files are in sync. I do have 4gb of Data to sync and about 5.000 files in many folders. This constellation was working with oc7 and ownCloud Clients a long time

wanno-drijfhout commented 9 years ago

On the other issue, I received more contribution from devs than in this ticket. I have also postulated a new theory: APC (the caching mechanism) might be out of memory. I need to confirm it yet, but feel free to help me to do so.

For those who want to try a hotfix to this issue, would you mind trying disabling APC in your PHP-installation? If the problem does not occur then (good news!), the next step would be experiment with configuring APC to have higher memory limits.

dirkhusemann commented 9 years ago

@Xsysstar i tried disabling APC: no dice, same result :-(

cyking commented 9 years ago

I don't think it's a memory issue. I have 8GB of free ram on one of my servers and I still have to restart php5-fpm.

wanno-drijfhout commented 9 years ago

@cyking For my server the same is true. However, APC has its own memory limits.

There are two primary decisions to be made configuring APC. First, how much memory is going to be allocated to APC; and second, whether APC will check if a file has been modified on every request. [source]

cyking commented 9 years ago

I upgraded to PHP 5.6 and the problem went away.

How to upgrade from php v.5.5.9 to v.5.6

https://www.digitalocean.com/community/questions/how-to-upgrade-from-php-v-5-5-9-to-v-5-6

I should also mention that prior to upgrading to the ppa, I pulled owncloud v8.0.1 branch and the issue still persisted. I'm still currently using owncloud v8.0.1 with php v5.6. I have not tested owncloud v8.0.0 with php v5.6

wanno-drijfhout commented 9 years ago

Thanks @cyking! I will wait a little bit upgrading (I am a bit reluctant to install random PPAs), but may do so if the problem persists.

On my end, disabling APC does actually seem to have effect. I'll have to wait a bit more to be sure.

Disabling APC(u)

In /etc/php5/fpm/conf.d/20-apcu.ini

;extension=apcu.so
apc.enabled=0

Then service php5-fpm restart (and I restarted nginx to be sure).

wanno-drijfhout commented 9 years ago

A week of a working ownCloud is better than half a day, but I encountered another 502 Bad Gateway-error just now. And, conform expectation, php5-fpm was dead.

root@server:~# service php5-fpm status
php5-fpm stop/waiting
root@server:~# service php5-fpm start
php5-fpm start/running, process 474
root@server:~# service php5-fpm status
php5-fpm start/running, process 474

I just "started" (instead of restarted) php5-fpm but that does not solve the problem, by the way. I have to restart php5-fpm after just starting it for nginx to kiss php5-fpm again.

My hypothesis of locked resources could permit this behaviour. I do wonder what kind of (system-wide) resources php5-fpm/ownCloud uses that are not automatically disposed when it crashes? In particular: what resources are used by php5-fpm/ownCloud but even more so if APC is also enabled? Some kind of storage, or handle, or socket? Ideas/suggestions?

P.S. Today is Pi-day and yesterday this ticket was a month old! Happy anniversary, everyone! How shall we celebrate and thank the ownCloud-devs? (lolcat-pictures eating cloud-shaped pie?)

cyking commented 9 years ago

I've never had apc enabled on my owncloud hosts. The only solution for me was to upgrade php via PPA.

wanno-drijfhout commented 9 years ago

I suppose I will upgrade to PHP 5.6 as well (with APC enabled). I'll report back if the problem persists.

PVince81 commented 9 years ago

@josh4trunks

josh4trunks commented 9 years ago

I believe this was addressed in 8.0.2 by disabling buggy, versions of APCu <4.0.6

for anyone experiencing this issue, what version of owncloud and APCu do you have?

karlitschek commented 9 years ago

Yes. I assume this is fixed. Let´s close it for now. We can always reopen if the problem still exists.

wanno-drijfhout commented 9 years ago

@josh4trunks I used the Ubuntu trusty packages of PHP and any of its modules. It seems trusty contains a buggy php5-apcu version (4.0.2-2build1). My ownCloud-installation is of version 8.0.1.

I followed @cyking's instructions for upgrading PHP, which (with APCu enabled) seems to indeed work (for now).

karloscarrijo commented 9 years ago

I probably have no business here, since I don't actually use Owncloud, but I just wanted to thank you guys for the insights and solution. I've been having this exact same problem for almost two weeks now, on a extremely busy website that I run with Ubuntu+Nginx+php-fpm. After several days of testing with no success, I arrived at this discussion and upgrading PHP from 5.5.9 to 5.6.7 (APC enable as well) did the trick. Wish I knew why, but anyway, thank you all again!

fiatux commented 9 years ago

any updates on this issue?

i'm experiencing this on 5.5.23 on ubuntu trusty.

@cyking is this the correct way to upgrade php5-fpm to 5.6 also as php-fpm? https://www.digitalocean.com/community/questions/how-to-upgrade-from-php-v-5-5-9-to-v-5-6

cyking commented 9 years ago

@fiatux Yes, for me this PPA works fine. "He is one of the Debian maintainers of the php5 package."

thorleifjacobsen commented 9 years ago

I'm having this problem too, same error logs, trying to disable APC now as suggested above. next might be an upgrade but i'm not really willing to :-/

kamaroly commented 9 years ago

service php5-fpm restart Solved the issue for me

bobweston commented 8 years ago

For what it's worth, I had this same issue after upgrading my server from ubuntu 12.04 to 14.04. Our devops setup is completely puppetized. The problem was I had forgotten to do a puppet provisioning run after upgrading the server.

In vagrant, I just had to run vagrant provision [server_name] to get everything working again.

bobweston commented 8 years ago

One other point, if you google php5-fpm crashes, you'll see that this is a common problem (php5-fpm crashes which lead to 502 errors and the need to restart).

One quick and dirty way to fixing it is to set up a root crontab that restarts php5-fpm hourly.

sudo crontab -u root -e (This assumes you have sudo privileges)

Then add

0 * * * * /usr/sbin/service php5-fpm restart > /dev/null 2>&1

Tips for a cleaner way of dealing with the problem can be found in this thread: http://serverfault.com/questions/575457/constantly-have-to-reload-php-fpm

donSchoe commented 8 years ago

why is this still closed? is creating a cronjob which restarts fpm every hour just-in-case really a solution?

i have the issue still with owncloud 8.2, nginx 1.8.0 and php 5.6.14. on archlinux with kernel 4.2.4-1.

my current workarounds: cronjob every full our to ensure fpm is running 0 * * * * systemctl restart php-fpm > /dev/null 2>&1 switched nginx fpm socket away from unix to ip server 127.0.0.1:9000; (same in fpm config) increased nginx fastcgi_read_timeout 120; increased fpm values in config:

pm.max_children = 100
pm.start_servers = 20
pm.min_spare_servers = 5
pm.max_spare_servers = 30
pm.max_requests = 500

upgrading php is no option for me since i'm already on 5.6.x. this is a fresh installation of owncloud 8.2.x. maybe if these workarounds are desired behaviour, we should add them to the ngingx docs?

donSchoe commented 8 years ago

tl;dr all fixes of this thread applied = still bad gateway / timeout issues. please reopen @karlitschek

restarting fpm seems not to fix this either.

josh4trunks commented 8 years ago

@donSchoe what version of php-apc do you have?

from my perspective this isn't a bug with ownCloud but with the underlying dependencies which this project has no control of (its the operating systems jobs to make sure not to ship buggy versions). workarounds could be documented somewhere in owncloud/documentation. they're always open to PRs so feel free to write one and @ mention this issue.

donSchoe commented 8 years ago

@josh4trunks no apc installed. should I?

extra/php-apcu 4.0.7-1