Open csib opened 3 years ago
Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.
Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.
Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.
Thank you. I've checked and it's already enabled.
Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.
In the session table the session_id is primary key in my database.
In the session table the session_id is primary key in my database.
@csib I know the session table has a primary key. I thought maybe some other table which has no primary key might make problems with the cluster as such. But I don’t really think that’s the case.
I had similar issue in one of high traffic sites (was resolved by bigger hosting and some configs for mysql by hosting support, and later by extra caching proxy). The session table is a bottleneck, unfortunately.
@joomdonation @richard67 do you remember, does joomla 4 still store session metadata in the database, even if the session handler is not "database"? I remember there was some discussion about it in past, but I not know what was done.
@Fedik I have the same problem as you have. I remember there was something, but I don't remember details.
There is an option "Track Session Metadata" in the "System" tab of Global Configuration in J4, which is not there in J3. So it seems in J4 it's possible to switch that off.
long time ago #19460 but no one was interested
long time ago #19460 but no one was interested
@alikon And it was called new feature. In my opinion, a new index or pk where was no pk before is not really a new feature. I think it should be fixed in 3.9 or 3.10 and could be even tested by review only.
Plus debug info: When the Mariadb Cluster fails I am always seeing a lot of insert queries waiting in the queue in the database (SHOW PROCESSLIST).
At first there are 5-10 messages like that, after a while it's maxed out (500), but the database is non-functional when I see these messages raising, not needed to reach the 500.
I am also using @joomdonation's Membership Pro as seen in the screenshot. Maybe it is a useful plus information.
Thank you!
@csib We have system plugins in the extension, however, it is only triggered to be run every 1 hour (the time is configurable), not every page loads, so that should not cause any problem.
The main problem here, as people figured out, is how Joomla handles session meta data. In a high traffic website like yours, there would be problems because there will be too many records inserted into that table. I don't have time to look at it right now, but there are few things you could try:
Try to modify this line of code https://github.com/joomla/joomla-cms/blob/staging/libraries/src/Application/CMSApplication.php#L832 , change 2 to 5 for example. That would reduce number of records inserted into sessions table
Maybe instead of relying on System - Session Data Purge to delete expired session data (which will be triggered random), you can try to disable this plugin and setup cron job to trigger the process instead. The cron job will need to trigger these cli scripts:
https://github.com/joomla/joomla-cms/blob/staging/cli/sessionGc.php
https://github.com/joomla/joomla-cms/blob/staging/cli/sessionMetadataGc.php
Could not say that it would solve your issue but that is something I think you can try. I will try to look at this problem again when I have more time. But I remember that in the past, we had discussed about same problem here (someone had a high traffic website but we could come up with a solution)
OK. Found the original discussion https://github.com/joomla/joomla-cms/issues/19146
Thank you very much I will try your advices.
Generally it is not a high traffic website (or dont know what the high traffic means for Joomla). The problem is that it is occurs sometimes in my website (due to a little traffic pike). Using 3 nodes, the CPU load is between 30-60% (2x 4cores, 8GB mem 1x 6cores 16GB mem) but they are VPS servers so the cores are not dedicated.
And the sadest thing is anyone can DOS my website right now, anyone who can make at least 4pageload/sec against the website and this is a very bad thing.
If I use a single Mariadb database the problem is not occuring, in my tests it happens only with clustered Mariadb. (mentioned in the first post)
Make long story short: Thank you very much! I am going to try your advices. If they won't help, I will try to limit the pageloads/sec from the same IP for 2 or 3/sec (in proxy level), I hope I won't bother real users with this solution. If I end up with this proxy thing I will post the settings here, maybe it will help someone else who has a similar system like me.
With the solution that @joomdonation mentioned the website and the database are much more stable.
Without this fix my site randomly broke down between 2-4 days, but now it's lasts for almost a month. Yesterday there were a problem again due to this, but much much better then before.
So I can confirm that this solution is working, and the website is more stable, thank you @joomdonation.
Maybe I will try with Postgres(HA version) instead of Mariadb Galera and see what happens.
Steps to reproduce the issue
Deploy Joomla to K8S. After a few hours/days (depends on the load!) the database will crash due to the Joomla. The InnoDB replication brokes and a lot of pending writes to _session table are waiting/blocked. In my tests the root cause is the sessions itself.
I can reproduce this in my infrastructure with an average of 4pageload/sec.
Expected result
Working database.
Actual result
The databse replication will fail due to the Joomla. I have never experienced this behaviour on my other galera nodes, only that one where the Joomla is running.
System information (as much as possible)
Infra: Kubernetes Ingress Proxy: Traefik Image: Custom built based on Alpine Joomla: 3.9.27 Database: Mariadb Galera Cluster 10.5.10 (There is an open issue related to this already in their repo ) Session settings: Behind Load Balancer setting is set to Yes
Additional comments
I have tried several settings, load balancer on/off, setting rollback on Galera, changing timeouts, change from Redis to DB. In my test when I had 1 Joomla Pod and 1 Galera pod everything is working. After I scaled up the Gaelra to the expected 3 pods, the system will crash in a few hours.