Closed jcfischer closed 8 years ago
Just for the record: @jcfischer you missed to censor a mail address. I replaced it with a xxx
as well.
@blizzz You introduced this
@LukasReschke - thanks!
We have disabled the update call and we have seen a massive improvement on our slow queries as witnessed by the time stamps on our last few slow query logs (I disabled it around 08:20):
-rw------- 1 postgres postgres 10485790 Apr 8 07:11 postgresql-2015-04-08_070849.log
-rw------- 1 postgres postgres 10485859 Apr 8 07:13 postgresql-2015-04-08_071140.log
-rw------- 1 postgres postgres 10486039 Apr 8 07:43 postgresql-2015-04-08_071356.log
-rw------- 1 postgres postgres 10486038 Apr 8 07:51 postgresql-2015-04-08_074339.log
-rw------- 1 postgres postgres 10485891 Apr 8 08:01 postgresql-2015-04-08_075134.log
-rw------- 1 postgres postgres 10489582 Apr 8 08:11 postgresql-2015-04-08_080148.log
-rw------- 1 postgres postgres 10486199 Apr 8 08:18 postgresql-2015-04-08_081103.log
-rw------- 1 postgres postgres 3372146 Apr 8 08:36 postgresql-2015-04-08_081824.log
Weird .... there is even an index on this table and it's just one update per user per login:
@butonic You maybe know DB internals better. But having a simple update run for over 1 second up to over 37 seconds is really really bad.
@MorrisJobke the index actually makes the update more expensive because it too needs to be updated. Also: One update per login makes an additional n*2 updates per minute, where n number of active users... We currently have 8100 users (granted - not all of them active). Our DB server is struggling to keep up to date with the regular updates anyway, so any additional load further kills performance.
@MorrisJobke the index actually makes the update more expensive because it too needs to be updated.
I just noticed on a MySQL DB that adding an index before an update reduced the needed time, because the lookup of the corresponding entry was faster. But maybe this is different for Postgres.
@jcfischer so you log out your users after 30 seconds? Note: login does not mean connect(ion)
Also can we have some numbers about this table? Like total entries, number of entries when grouped by user name, number of entries when grouped by app = login and key = lastLogin That would be awesome
@nickvergessen - I just counted 53k (slow) updates on this table only today. We separate web clients and desktop clients - I don't know if the desktop clients keep being logged in all the time:
$ grep lastLogin postgresql-2015-04-07_*.log | wc -l
77262
$ grep lastLogin postgresql-2015-04-08_*.log | wc -l
53569
Note that yesterday we removed the php session file locking because it took out our application server one by one (they were stuck in requests that were waiting for the session file to be released). That might explain the much lower number yesterday (when our users basically couldn't use the system)
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_analyze | last_autovacuum | last_autoanalyze
-------------------------------+-----------+-----------+-----------+------------+------------+-------------+-------------------------------+-------------------------------+-------------------------------
oc_preferences | 3355 | 2992451 | 1560 | 77105 | 28 | | 2015-04-08 08:04:49.472805+00 | 2015-04-08 08:51:02.523624+00 | 2015-04-08 08:49:59.028155+00
owncloud=# select count(*) from oc_preferences;
count
-------
77398
owncloud=# select appid, count(*) from oc_preferences group by appid;
appid | count
----------------+-------
settings | 8121
contacts | 7
user_ldap | 33667
documents | 1
owncloud | 1
core | 213
login_token | 6601
firstrunwizard | 6875
files | 15532
calendar | 3
login | 3705
files_sharing | 2672
(12 rows)
owncloud=# select count(*) from oc_preferences where appid = 'login' and configkey = 'lastLogin';
count
-------
3705
(1 row)
Note that yesterday we removed the php session file locking because it took out our application server one by one (they were stuck in requests that were waiting for the session file to be released).
This is something we should look into too. Something seems to block the sessions.
Note that yesterday we removed the php session file locking because it took out our application server one by one (they were stuck in requests that were waiting for the session file to be released).
This is something we should look into too. Something seems to block the sessions.
@jcfischer You mean that you removed the (previously added) comment here: https://github.com/owncloud/core/blob/v7.0.5/lib/private/session/internal.php#L52 So now it looks like that line.
Or am I wrong?
We separate web clients and desktop clients - I don't know if the desktop clients keep being logged in all the time:
so these 53k calls are web or client or both?
Would be great to get values for each of them - desktop client should not be responsible because the client reuses the session cookie.
Well if you had problems with sessions (meaning you killed it and required everyone to login again) that would explain why many updates happened, however it does not really explain why it is that slow.
Well if you had problems with sessions (meaning you killed it and required everyone to login again) that would explain why many updates happened, however it does not really explain why it is that slow.
I would say, that this is just a misunderstanding. We rolled out the "don't close session early" to prevent too much parallel requests to avoid the possibility of our race conditions with lost fileids. @jcfischer can you confirm that this is the case?
@MorrisJobke - yes that is exactly what we did. Our application server processes where starving, waiting for the session file to become unlocked. It seems that a few clients can completely shut down an app server with 60 PHP processes doing that.
@nickvergessen - as Morris said, we have just implemented the things recommended (i.e. keep the desktop clients on one application server (sticky) instead of distributing them among all our servers with round robin from the load balancer and uncommented the "don't close sessions early" "fix"). As I said - that kills our complete system on a regular basis, as the app servers get stuck waiting for a request to clear the session lock, and the load balancer removing those servers from rotation - overloading the remaining servers.
Whoever thought that it was a great idea to write data to the PHP session should come here for a day or two and try to keep the system running.
@nickvergessen - just a perspective - our two web servers (Web GUI) handle around 1-2 requests per second - if at all. Currently our 4 sync servers (that handle desktop clients) handle 150 - 200 requests per second.
Also - we have huge problems with the database server (for once independent from ownCloud) due to the fact that we are using a distributed storage system (Ceph) to store the volume that the database resides on. This storage has limited IOPS (around 100) and sometimes loads itself by doing its own maintenance. At this moment we are racking up new servers for the DB that will use battery powered SSDs. We expect to have them up and running tomorrow.
@jcfischer where are you storing the session files? In the GPFS filesystem? I strongly recommend using a memcached or redis (starting with OC8) server for this. Don't even consider storing it in the database. It is a bad idea in the first place which gets worse considering your db storage resides on a ceph device (RBD or FS?).
@butonic - they are stored in tmp files on a ramdisk on the server. According to @karlitschek that was the fastest method. A central memcached adds another 1-2ms for access to the data.
The logins now shouldn't happen that often anymore because the sync clients now properly use the session and doesn't re-authenticate anymore (which cause the new login timestamp).
Also the last time we talked this doesn't seem to be a performance issue anymore.
I therefore will close this ticket. If this is still an issue for you please report back.
Thanks
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
The
updateLastLoginTimestamp
function that is defined here: https://github.com/owncloud/core/blob/master/lib/private/user/user.php#L162 uses an incredibly inefficient way of storing the data.We are running 7.0.5
Our Postgres 9.3 server (which currently is running totally overloaded due to too few disk IOPS) shows me thousands of entries like this:
Mis-using the oc_preferences table for keeping this information seems like a really bad idea. It forces our database to constantly clean up the table as it accumulates a lot of dead tuples. Can this be moved to another table or to another (in-memory) database (redis comes to mind)?