plesk / centos2alma

CentOS 7 to AlmaLinux 8 conversion tool
Apache License 2.0
39 stars 10 forks source link

mariadb my.cnf reset during upgrade? #324

Open farisc1 opened 3 months ago

farisc1 commented 3 months ago

It appears as though any manual changes made to /etc/my.cnf are removed during the upgrade process (at least with 1.3.2), at least when mariadb 10.5 installed from mariadb.repo (yum.mariadb.org) was present pre-upgrade.

At any rate all the setting that were in my.cnf have mysteriously vanished post-upgrade, including my log-error and sql_mode settings under [mysqld]. Indeed, essentially no settings are left in my.cnf at all, not even the socket location and listen address/port.

I'm aware that certain things are different in Alma8 compared to Centos 7, and some settings have moved to be under systemd. But following the paths shown in systemctl status mariadb, none of the settings seem to have moved to the created "migrated-from" file.

Sadly we are experiencing strange problems with apache going into W state for all slots when certain actions are undertaken in a certain script, while at the same time plesk-php72-fpm reaches max_children, all of them seeming to be waiting for something but I can't figure out what. Unfortunately I can't find the cause of this, but I did find the missing entries in my.cnf while trying to debug things.

I must add a caveat that I do not know for certain that it was the centos2alma script that caused my.cnf to be effectively reset to nothing. For all I know something else did this prior to the upgrade (e.g. a bad maridb update rpm in the past), and I simply didn't notice.

Unfortunately I have no way to test or find out. I have Plesk-created backups from prior to the upgrade, but these do not include the content of /etc/

There is an /etc/my.cnf.rpmsave but it had a date that was last year. It contains what I'd call the correct settings that should be in my.cnf. There's also an .rpmnew but that looks the same as the essentially empty my.cnf and also has an old date from last year.

Again I therefore have to emphasise that I don't know for sure if this happened during the upgrade or through some other mechanism prior to the upgrade. I apologise in advance if it turns out that this is not related to the generally wonderful centos2alma script that has seriously saved my bacon and for which I am very grateful.

Bitpalast commented 3 months ago

Yes, the /etc/my.cnf settings are removed, because the file is replaced by a new factory default version as the database server needs to be replaced, too, by an Alma "version". I am not sure whether the same happens when individual settings were made in the conf.d subdirectory as that would be the place where user modified settings should go.

Anyway to your issue: With he symptom you describe and your guess that it has something to do with the database, the most likely cause is that the database temporary storage has not been granted enough RAM. This leads to frequent disk access, which again leads to slow processing of PHP scripts (that query the database), which again leads to a lot of PHP-FPM processes piling up. There are a few values that you can add to your /etc/my.cnf file to prevent this:

innodb_buffer_pool_size = <up to 70% of your system's RAM here> innodb_flush_method=O_DIRECT table_open_cache = 10000 innodb_log_file_size = <Something large here, like 15 GB> query_cache_size = 512M

Example: innodb_buffer_pool_size = 96G innodb_flush_method=O_DIRECT table_open_cache = 10000 innodb_log_file_size = 24G query_cache_size = 512M

Then restart MariaDB. A restart may take a while, because the buffer pool file needs to be initialized, which can take several minutes if it is a very large file. But a large innodb_buffer_pool_size value will speed up database transactions, hence speed up websites.

There can be other reasons for PHP-FPM processes piling up, e.g. many bad bot requests hitting the website (is Fail2Ban on and are you using suitable bad bot rules? If not, take a look at https://www.plesk.com/blog/product-technology/avoid-high-cpu-load-block-hackers-bad-bots-effectively/).

farisc1 commented 3 months ago

Thank you Bitpalast.

You are making me feel old and crusty here! Adding things in /etc/my.cnf has been how things have been done for decades (Including by Plesk itself when it makes configuration changes via the database performance enhancer). I wasn't aware that the modern option for user modifications is to use something under /etc/my.cnf.d/ and not in my.cnf itself :-( I will look at putting things where they now belong for future configuration changes.

But even so, overall I think it would be an enhancement if the script could take the [mysqld] section contents of my.cnf and put them in a suitable /etc/my.cnf.d/migrated-settings.cnf or in /my.cnf.d/server.cnf but that would be a feature request not a bug. I also don't recall being warned by the script that custom values my.cnf were going to be lost. It talks about change to dovecot's config files very clearly, but I didn't notice anything about my.cnf.

I am particularly alarmed about the possibility of certain settings being lost without the user knowing. For example, innodb_file_per_table = 1 On this particular system, this was not set so it didn't matter. There is also the following security default which gets set by "something" in most of my my.cnf files that is also lost:

symbolic-links=0

But maybe this is the default in 10.5 and later and so it doesn't matter.

Most of all, thank you for suggesting the possibility that my actual problem (Apache W states, php children waiting until timeout) could be caused by a DB issue. I'm not seeing a high load average on this system when the problem occurs, and there's almost no CPU usage by mariadb when it occurs, so I had discounted it, but maybe that was a stupid thing for me to do. And also thank you for the badbots possibility. I think this is definitely a problem whether or not it causes my issues. I wonder why the enhanced list of bots is not used by the Plesk-installed fail2ban by default?

I will experiment with some of the suggested optimizations to see if it helps. I just wish there was something concrete in an error log for me to see. In the meantime I'll update the badbots configuration.

Thank you again. I really appreciate it when someone takes the time to reply like you did, with detail and examples.