owncloud / core

:cloud: ownCloud web server core (Files, DAV, etc.)
https://owncloud.com
GNU Affero General Public License v3.0
8.35k stars 2.06k forks source link

Severe performance issues with 5.0.11 maybe infinite loop #5000

Closed roha4000 closed 10 years ago

roha4000 commented 10 years ago

We recently updated from 5.010 to 5.0.11 and experienced severe performance issues since then. After some investigations it looks like some kind of bug but I could not locate the reason(s) so I finally downgraded to 5.0.10 after a week and everything works fine again.

Webservers: 10x RedHat based Apache webservers We are running PHP 5.3 as mod_php so there are no separate PHP processes. Everything is processed within the httpd processes. We are using LDAP as authentication backend.

Database: MySQL Galera Cluster

Symptoms

In the owncloud log I found hundreds of entries like this:

ownCloud[6355]: {PHP} Allowed memory size of [...] bytes exhausted (tried to allocate 8208 bytes) at [...]/apps/user_ldap/lib/connection.php#177
ownCloud[29723]: PHP Fatal error:  Allowed memory size of [...] bytes exhausted (tried to allocate 523800 bytes) in [...]/lib/db.php on line 365

Both locations are handling database connections. I straced a httpd process on an affected server. The process seemed to be in a loop and send over and over again the same SQL-query with the same user name:

select * from  oc_users WHERE LOWER(`uid`) = LOWER('<USER1>');
select COUNT(*) from  oc_users WHERE LOWER(`uid`) = LOWER('<USER1>') LIMIT 3;

I greped the OC code for those queries but only found a match for the first one in lib/user/database.php:240 within the userExists($uid) function.

Conclusions so far

It seems to me that with 5.0.11 in some situations and / or for some users a loop is not exiting where "userExists()" is called. The loop runs until the memory_limit is reached and than crashes. This is not happening all the time but I cannot say if it is bound to some specific users or actions. It mainly occurs on those server dealing with sync client requests. (We have separate webserver for browser and sync client requests)

Does anybody has some ideas why and / or where such a loop could be started?

Thanks Roland Hager

Aikhjarto commented 10 years ago

Hi,

I have the problem with reoccurring SQL-query select COUNT(*) from oc_users WHERE LOWER(uid) = LOWER('') LIMIT 3; that impacts the performance of a whole webserver also. My setup is Owncloud 5.0.12 with apache 2.2.22, mariaDB 5.5.33 and php 5.3.17. The problem is not quite reproducible. Sometimes the infinite SQL-query loop starts during browsing the web-interface, whereas, sometimes is starts during a sync with the sync-client (from Linux and Windows Clients). If the probelem occurs the corresponding browser won't load the owncloud interface completely and the sync-client stops syncing. If the browser or the sync-client is closed, the SQL-querry flood stops immediately. Restarting the browser or sync-client again and doing the same tasks will not always reproduce the problem.

Best Regards

Aikhjarto

bantu commented 10 years ago

The COUNT(*) query is very likely generated in OC_DB::numRows() (see https://github.com/owncloud/core/blob/stable5/lib/db.php#L1183) which is used by OC_User_Database::userExists() (see https://github.com/owncloud/core/blob/stable5/lib/user/database.php#L246).

bantu commented 10 years ago

There is a lot of room for improvement in OC_User_Database. For example, receiving the display name via getDisplayName('foo') runs the following queries: SELECT * FROMPREFIXusersWHERE LOWER(uid) = LOWER('foo) via userExists() SELECT COUNT(*) FROMPREFIXusersWHERE LOWER(uid) = LOWER('foo) via numRows() and finally SELECTdisplaynameFROMPREFIXusersWHEREuid= 'foo' in getDisplayName() itself.

This is unnecessary and can be reduced to a single query.

bantu commented 10 years ago

First, better test coverage for OC_User_Database should be achieved. Then the implementations can be improved. Also see #5032

bantu commented 10 years ago

@roha4000 I guess that the query select COUNT(*) from oc_users WHERE LOWER(uid) = LOWER('<USER1>') LIMIT 3; provided by you is not an exact copy since we are using upper case SELECT almost everywhere.

The closest query I could find is

$query = OC_DB::prepare('SELECT `uid` FROM `*PREFIX*users` WHERE LOWER(`uid`) LIKE LOWER(?)', $limit, $offset);

which is also using LIKE instad of =. (see https://github.com/owncloud/core/blob/stable5/lib/user/database.php#L225).

karlitschek commented 10 years ago

There is definitely a lot of room for performance improvements. :-(

Aikhjarto commented 10 years ago

With owncloud 5.0.12 the SELECT is uppercase indeed. Sorry for the confusion. Here are the repetitive lines of my SQL-Log from today: SELECT * FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME') SELECT COUNT(*) FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME') LIMIT 3 USERNAME is an actual username (replaced for anonymity)

bantu commented 10 years ago

@Aikhjarto Okay, thank you. If there really is a = instead of a LIKE, the query is probably from somewhere else. I also haven't spotted the LIMIT 3 yet.

Aikhjarto commented 10 years ago

@bantu Yes this is actually copy-pasted. So in the log there is a '='.

bantu commented 10 years ago

This looks close. https://github.com/owncloud/core/blob/master/core/ajax/share.php#L291

bantu commented 10 years ago

The number of queries of userExists() has been reduced from 2 to 1 on the master branch. See https://github.com/owncloud/core/blob/master/lib/private/user/database.php#L240 Only the COUNT(*) query is run now. This is better than what is in stable5, but still a bit suboptimal. Should use a LIMIT 1 query instead. This is just a yes/no question, the number of users does not matter.

bantu commented 10 years ago

@Aikhjarto May I ask you to apply this patch (https://gist.github.com/bantu/e2d729528748ba169452/raw/af8fdb7ae4da9587f83047bc2bf51037d6bd0a66/ticket-5000-debug.diff)? It should log a backtrace to the owncloud log when the affected query is executed. This way we can hopefully find out where the query is from.

Aikhjarto commented 10 years ago

@bantu I applied the patch and the problem occurred again after applying. However, nothing is in the log files. In fact the log file's last entry is from yesterday. Do I have to activate some kind of debug-mode?

Aikhjarto commented 10 years ago

@bantu I thougth that something with the stripos in your patch is too restrictive. So I relaxed the if-clause to just get limit == 3 and now I get the following log-entry (many of them)

I replaced the real username with "USERNAME" for anonymity.

bantu commented 10 years ago

@Aikhjarto Thank you for taking care of that. A more-readable version of the backtrace can be found here: https://gist.github.com/bantu/eb63ea84fe89f0c47b1b

bantu commented 10 years ago

The interesting part of the path is

[...]
OC_Connector_Sabre_Auth::authenticate()
OC_Util::setupFS()
\OC\Files\Filesystem::init()
\OC\Files\Filesystem::initMountPoints()
OC_User::getHome()
OC_User_Database::getHome()
OC_User_Database::userExists()
PDOStatementWrapper::numRows()
OC_DB::prepare()

OC_DB::prepare() has 3 as the second argument. No clue where that is from.

bantu commented 10 years ago

Okay, I finally figured this out. It's just https://github.com/owncloud/core/blob/stable5/lib/db.php#L1186 and the fact that PDO::FETCH_NUM is passed as an argument, which is 3 and incredibly wrong.

Aikhjarto commented 10 years ago

@bantu I applied your pull request to my owncloud installation. Unfortunately, the problem of the repeated SQL-queries remains. However, now the lines lack of the LIMIT argument: SELECT COUNT() FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME') SELECT \ FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME')

icewind1991 commented 10 years ago

The extra SELECT COUNT query can be solved by replacing https://github.com/owncloud/core/blob/stable5/lib/user/database.php#L246 with return (bool)$result->fetchRow().

numRows should be avoided where possible.

Aikhjarto commented 10 years ago

@icewind1991 Indeed the 'SELECT COUNT' line is gone in the log. But SELECT * FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME') still occurs in a loop.

bantu commented 10 years ago

@Aikhjarto Removing the pointless limit 3 will do almost nothing to improve performance. But it was a crucial part of understanding where the query is actually generated. I will send more changes to futher improve performance. But I am not going to do this without test coverage. So test coverage has to be written first.

@icewind1991 The best way to do this is SELECT uid FROM oc_users WHERE LOWER(uid) = LOWER('USERNAME') LIMIT 1. Also see https://github.com/owncloud/core/issues/5000#issuecomment-25809925

sezuan commented 10 years ago

Hi,

I've the very same problem. Unfortunately I don't have a real solution. At least I can duplicate the issue very easily by syncing certain files with the ownCloud client. I've noticed, that disabling the Versions app solves the problems

Maybe you can try that as well. If it's really related to the Versions app it might be a good hint to dig further.

Matthias

sezuan commented 10 years ago

After some debugging I found out, that the recursion/loop occurs when the version module tries to remove old versions in the function delOldVersion, to be exact: $versionsFileview->unlink($version['path'] . '.v' . $version['version']); never ends. To duplicate this issue you need to sync a file every few seconds, after the limit is reached and old versions will be deleted, the sync fails. This might explain why it doesn't happen all the time.

karlitschek commented 10 years ago

@schiesbn any idea?

bantu commented 10 years ago

Likely path that leads to the queries:

\OCA\Files_Versions\Storage::getUidAndFilename()
\OC\Files\Filesystem::initMountPoints()
OC_User::getHome()
OC_User_Database::getHome()
OC_User_Database::userExists()
PDOStatementWrapper::numRows()
OC_DB::prepare()
wollir72 commented 10 years ago

I have the same problem that httpd processes consumed a lot of CPU on sync and never stop. It occurs since ownCloud 5.0.12 I created a issue https://github.com/owncloud/core/issues/5312, but no one can find the reason. Today I found this thread and tried the hint with the versions app. After disabling the Versions app sync works fine. But it is no real solution^^

Aikhjarto commented 10 years ago

The problem just occured again during a sync. I disabled the version app and the the sync worked (o.k. it left some conflict files, but thats no big deal).

bantu commented 10 years ago

@schiesbn Bump. Any idea?

sercxanto commented 10 years ago

I experience the very same problem. After googling for the "memory exhausted" error I found in the owncloud log I ended on this bug report page. IMO the bug is not an performance issue but rather critical one and a hotfix for the 5.x versions should be released as it

For the time being as a workaround I disabled versioning and see if it helps.

schiessle commented 10 years ago

I debugged it. The problem are that the hooks are triggered for the unlink operation even if the views root is /user/files_versions seems to be the same as https://github.com/owncloud/core/pull/5247. @icewind1991 can we backport it to stable5?

We end up at this place: https://github.com/owncloud/core/blob/stable5/lib/files/view.php#L692

return Filesystem::getView()->getRelativePath($this->getAbsolutePath($path));

where getAbsolutePath() returns the correct path /user/files_versions/foo.txt and getRelativePath() assume that user/files is the root so that at the end the path is trimmed to _versions/foo.txt

schiessle commented 10 years ago

fix from master backported to stable5: https://github.com/owncloud/core/commit/7923f93ee3b63cf1359f03b9d5beecb486b7e8a5

Will be released with the next bugfix release.