joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.77k stars 3.65k forks source link

Joomla in Kubernetes with Mariadb Cluster #34515

Open csib opened 3 years ago

csib commented 3 years ago

Steps to reproduce the issue

Deploy Joomla to K8S. After a few hours/days (depends on the load!) the database will crash due to the Joomla. The InnoDB replication brokes and a lot of pending writes to _session table are waiting/blocked. In my tests the root cause is the sessions itself.

I can reproduce this in my infrastructure with an average of 4pageload/sec.

Expected result

Working database.

Actual result

The databse replication will fail due to the Joomla. I have never experienced this behaviour on my other galera nodes, only that one where the Joomla is running.

System information (as much as possible)

Infra: Kubernetes Ingress Proxy: Traefik Image: Custom built based on Alpine Joomla: 3.9.27 Database: Mariadb Galera Cluster 10.5.10 (There is an open issue related to this already in their repo ) Session settings: image Behind Load Balancer setting is set to Yes

Additional comments

I have tried several settings, load balancer on/off, setting rollback on Galera, changing timeouts, change from Redis to DB. In my test when I had 1 Joomla Pod and 1 Galera pod everything is working. After I scaled up the Gaelra to the expected 3 pods, the system will crash in a few hours.

joomdonation commented 3 years ago

Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.

richard67 commented 3 years ago

Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.

csib commented 3 years ago

Since it relates to session table, one thing you should check is making sure System - Session Data Purge plugin is enabled.

Thank you. I've checked and it's already enabled.

Possibly the MariaDB cluster has problems with tables which don't have a primary key, at least I've recently heard something like that elsewhere, and Joomla has a few such tables.

In the session table the session_id is primary key in my database.

richard67 commented 3 years ago

In the session table the session_id is primary key in my database.

@csib I know the session table has a primary key. I thought maybe some other table which has no primary key might make problems with the cluster as such. But I don’t really think that’s the case.

Fedik commented 3 years ago

I had similar issue in one of high traffic sites (was resolved by bigger hosting and some configs for mysql by hosting support, and later by extra caching proxy). The session table is a bottleneck, unfortunately.

Fedik commented 3 years ago

@joomdonation @richard67 do you remember, does joomla 4 still store session metadata in the database, even if the session handler is not "database"? I remember there was some discussion about it in past, but I not know what was done.

richard67 commented 3 years ago

@Fedik I have the same problem as you have. I remember there was something, but I don't remember details.

There is an option "Track Session Metadata" in the "System" tab of Global Configuration in J4, which is not there in J3. So it seems in J4 it's possible to switch that off.

alikon commented 3 years ago

long time ago #19460 but no one was interested

richard67 commented 3 years ago

long time ago #19460 but no one was interested

@alikon And it was called new feature. In my opinion, a new index or pk where was no pk before is not really a new feature. I think it should be fixed in 3.9 or 3.10 and could be even tested by review only.

csib commented 3 years ago

Plus debug info: When the Mariadb Cluster fails I am always seeing a lot of insert queries waiting in the queue in the database (SHOW PROCESSLIST). image

At first there are 5-10 messages like that, after a while it's maxed out (500), but the database is non-functional when I see these messages raising, not needed to reach the 500.

I am also using @joomdonation's Membership Pro as seen in the screenshot. Maybe it is a useful plus information.

Thank you!

joomdonation commented 3 years ago

@csib We have system plugins in the extension, however, it is only triggered to be run every 1 hour (the time is configurable), not every page loads, so that should not cause any problem.

The main problem here, as people figured out, is how Joomla handles session meta data. In a high traffic website like yours, there would be problems because there will be too many records inserted into that table. I don't have time to look at it right now, but there are few things you could try:

Could not say that it would solve your issue but that is something I think you can try. I will try to look at this problem again when I have more time. But I remember that in the past, we had discussed about same problem here (someone had a high traffic website but we could come up with a solution)

joomdonation commented 3 years ago

OK. Found the original discussion https://github.com/joomla/joomla-cms/issues/19146

csib commented 3 years ago

Thank you very much I will try your advices.

Generally it is not a high traffic website (or dont know what the high traffic means for Joomla). The problem is that it is occurs sometimes in my website (due to a little traffic pike). Using 3 nodes, the CPU load is between 30-60% (2x 4cores, 8GB mem 1x 6cores 16GB mem) but they are VPS servers so the cores are not dedicated.

And the sadest thing is anyone can DOS my website right now, anyone who can make at least 4pageload/sec against the website and this is a very bad thing.

If I use a single Mariadb database the problem is not occuring, in my tests it happens only with clustered Mariadb. (mentioned in the first post)

Make long story short: Thank you very much! I am going to try your advices. If they won't help, I will try to limit the pageloads/sec from the same IP for 2 or 3/sec (in proxy level), I hope I won't bother real users with this solution. If I end up with this proxy thing I will post the settings here, maybe it will help someone else who has a similar system like me.

csib commented 3 years ago

With the solution that @joomdonation mentioned the website and the database are much more stable.

Without this fix my site randomly broke down between 2-4 days, but now it's lasts for almost a month. Yesterday there were a problem again due to this, but much much better then before.

So I can confirm that this solution is working, and the website is more stable, thank you @joomdonation.

Maybe I will try with Postgres(HA version) instead of Mariadb Galera and see what happens.