turnkeylinux / tracker

TurnKey Linux Tracker
https://www.turnkeylinux.org
70 stars 16 forks source link

TKLBAM fails if root MySQL/MariaDB user has a password set. #1534

Open JedMeister opened 3 years ago

JedMeister commented 3 years ago

TKLBAM depends on root access to MySQL/MariaDB to back up DBs.

In v14.x and previous, TKLBAM leveraged the Debian provided specific user account (debian-sys-maint) for root like access by the system.

However, as of v15.0 (Debian Stretch/9) MySQL/MariaDB moved to using unix_socket authentication for the root mysql user. So the debian-sys-maint mysql user was no longer required.

However, a number of users set a root mysql password. Unless there is a root-like account that TKLBAM can access, that locks TKLBAM out.

TKLBAM appears to try to work around that by launching a new unauthenticated mysqld process as root (via mysqld_safe --skip-grant-tables). However, that causes issues as it interferes with the existing running service (FWIW, it appears to crash the default MySQL/MariaDB service, but it leaves a zombie mysqld process running which interferes with the root process started by TKLBAM).

The best "fix" IMO is to reinstate the unix_socket authentication method for the root mysql user.

But perhaps we can work out some other method (e.g. TKLBAM could stop the MySQL/MariaDB service and temporarily start an unauthenticated process just to dump the DBs)?

JedMeister commented 3 years ago

Potentially related to #1529

JedMeister commented 3 years ago

It has been suggested that at the bare minimum, it should not crash the MySQL/MariaDB service (which is reasonable IMO).

OnGle commented 3 years ago

I would say that it's reasonable to gracefully fail here so long as we handle #1529 correctly and provide a reasonable error message explaining why it doesn't work. In my mind stopping and starting is in some ways a DOS and the only real workaround I can think of is to provide an additional user account which just feels dirty to me.

packman2008 commented 2 years ago

I'm relatively new to TKL and TKLBAM so apologies in advance if I've misunderstood any of this.

I've been trying to use TKLBAM-backup to backup a server with a number of MariaDB databases on it and it kept on killing the database, and therefore also killing all of my apps. I couldn't figure out why until I chanced across this bug.

Some of the comments here make it sound like having a password for a root account is a bad thing. Having recently retired from a long-term cyber-security job I feel that having unauthenticated root access is a serious security hole. The MySQL/MariaDB developers feel the same way because after installation of the software they strongly recommend running mysql_secure_installation. One of the things this script does is prompt you to set a root password. So the more people that follow the documentation, the more people will have database servers where TKLBAM breaks things for them.

I'm also a little confused why providing an additional user account 'feels dirty'. Coming from a corporate background backups were always done using a dedicated backup account, whether at the operating system level or within a database. The account would have OS or database 'BACKUP' privileges to let it do what it needed to do.

As a developer of software that interacts with MySQL/MariaDB databases, every script I write authenticates itself with the database to get the information it needs. If I'm lazy I could use the root account (with password), but I prefer to create specific user accounts within the database that give me access to the data I need. So if all my scripts need a username/password why not do the same for TKLBAM? A privileged backup account could easily be set up and used, or the root account password could be provided to TKLBAM to let it do its stuff.

Either way, it would be good to see a solution soon that works reliably and doesn't stall the database when run. Also please don't shut the database down to do the backup! That's not necessary and will put people off using TKLBAM (starting with me!). I worked in IT long enough that I remember the days when you used to have to shut the database down to back it up. Those days are long past though!

JedMeister commented 2 years ago

I'm relatively new to TKL and TKLBAM so apologies in advance if I've misunderstood any of this.

Welcome! :smile: :+1: No worries, I'll try to explain clearly but please ask for elaboration if need be.

Before I go on, please note that I use MySQL and MariaDB interchangeably. Strictly speaking, as of v15.0 (mid 2018) TurnKey has MariaDB (not MySQL). Debian transitioned to MariaDB (v10.1 IIRC). If I mean either explicitly, I'll note it. If I don't, then I'm really referring to MariaDB.

I've been trying to use TKLBAM-backup to backup a server with a number of MariaDB databases on it and it kept on killing the database, and therefore also killing all of my apps. I couldn't figure out why until I chanced across this bug.

Ah ok. It's no good that it failed for you, but glad that you found us here, so we can have this convo and hopefully get it working again for you.

Some of the comments here make it sound like having a password for a root account is a bad thing.

Just to be clear, we're not talking about a root MySQL user with no authentication. We're talking about using unix socket authentication for securing the root MySQL user (i.e. no root user password required, but only the root Linux user can get access). FWIW that is the default (and AFAIK preferred on unix like systems such as Linux) method for securing the root account as of (IIRC) MySQL v5.6 & MariaDB v10.1 (explicitly).

Having recently retired from a long-term cyber-security job I feel that having unauthenticated root access is a serious security hole.

I agree, but as noted above, that's not what we're advocating.

The MySQL/MariaDB developers feel the same way because after installation of the software they strongly recommend running mysql_secure_installation. One of the things this script does is prompt you to set a root password. So the more people that follow the documentation, the more people will have database servers where TKLBAM breaks things for them.

Is that still the case? TBH, I haven't run it for years because we already pre-apply all the changes it makes (e.g. removal of all insecure accounts, demo DB, etc). We also set a root password, until the move to unix socket auth.

AFAIK it does give the option to set a root password, but will happily allow you to keep the unix authentication. And I think that these days if you aren't using unix socket authentication that it will ask if you want to configure that.

Having said that, it's worth noting, that if someone has root Linux user access, then it's super simple to stop MySQL/MariaDB and restart it with authentication disabled - bam root MySQL user with no password...

FWIW, I think that the changes we make are somewhat redundant now as I'm pretty sure Debian pre-applies those same settings by default.

I'm also a little confused why providing an additional user account 'feels dirty'.

Because the clean way to do it is use the unix authentication. The backup runs as root (as that's what it needs) so automatically gets root MySQL/MaraiDB user access (which it needs to backup and restore the DB).

Prior to unix socket authentication, there was a special root-like MySQL user called 'debian-system-maint' which TKLBAM (and other system processes) leveraged. As noted, that was a "root-like' user, but it was used by the system and it's password was stored in plain text! IIRC it was read only by root, but still not ideal IMO.

So if you want to be able to backup and restore DBs, you need to get root (or at least root equivalent) access. That can be done, either via a unix socket (so only processes running as root can get access) or you need to store a plain text password for a root like account somewhere.

Coming from a corporate background backups were always done using a dedicated backup account, whether at the operating system level or within a database. The account would have OS or database 'BACKUP' privileges to let it do what it needed to do.

Using a separate account is a good idea for auditing etc, but beyond that, I'm not sure it adds much more than complexity.

I think I get where your perspective might be coming from, but I hope my context above makes it at least somewhat clear why we decided what we decided.

As a developer of software that interacts with MySQL/MariaDB databases, every script I write authenticates itself with the database to get the information it needs. If I'm lazy I could use the root account (with password), but I prefer to create specific user accounts within the database that give me access to the data I need.

FWIW, by with the default MySQL config, the root account can only be accessed locally and only by processes that are running as root (which as I'm sure you'd agree, most software never should be). Even if you want remote root access, we recommend creating a new (root@remote) root user (with a password).

So if all my scripts need a username/password why not do the same for TKLBAM? A privileged backup account could easily be set up and used, or the root account password could be provided to TKLBAM to let it do its stuff.

Ultimately, there is no intrinsic reason why that couldn't be done. But beyond the ability to externally audit what it does, I'm not convinced that it adds any value. TKLBAM runs as the root Linux user and needs root access to MySQL. It gets that securely for free by default, without any need for storing passwords anywhere. Disabling that configuration and creating an additional root-like account for it literally feels like going backwards!

Either way, it would be good to see a solution soon that works reliably and doesn't stall the database when run.

The best (IMO) "fix" is to re-implement unix socket authentication! At least for root@localhost. So long as you don't have local unprivileged apps using the root MySQL/MariaDB root user (with password), it should make no difference to other components on your system.

As I've likely made clear, IMO there is no good reason to have a root MySQL password set (obviously in the context of using unix socket instead). It adds convenience when doing admin and I would argue, increases security (vs a root like user's password stored in plain text).

An alternative "fix" would be to re-implement the 'debian-system-maint' user account (or similar) as used to be the case. It could be named anything, but must have root equivalent permissions. And then you would need to store it's password and trigger TKLBAM to use that.

TBH, I forget how it used to work exactly, but if you have read through this, understand are really sure that you would rather TKLBAM use a root-like account with a password stored in plain text, then let me know and I'll try to dig up some info on how it used to be done.

Also please don't shut the database down to do the backup! That's not necessary and will put people off using TKLBAM (starting with me!). I worked in IT long enough that I remember the days when you used to have to shut the database down to back it up. Those days are long past though!

I can't speak too much to that. TKLBAM was developed quite a while ago by a colleague. I'm not super familiar with it's internals and beyond a few bugfixes here and there, I've had little to do with it. Having said that, I wasn't aware that it actually stopped MySQL/MariaDB? I suspect that is a result of this "issue", rather than what it does by default. I was under the impression that it caused the DB to crash when it couldn't get root access, but I'm not sure that it stops the DB when it works!?!

I'll try to have a look into the code some time soon and see what I can see. In the meantime, if you can confirm that the DB is being stopped, then I'll definitely have a closer look and see if we can improve things...

packman2008 commented 2 years ago

Thanks for the quick reply and the explanations.

I also use MySQL == MariaDB so I fully understand. You are correct that I'd confused root account authentication and socket authentication. I was aware of the MySQL socket, but as I access my databases remotely across an internal network it's not something I follow closely and had missed the changes to socket authentication.

I'm so used to running mysql_secure_installation (or more correctly mariadb_secure_installation) after setting up a server that it's automatic for me to do it. So I'd not spotted that it wasn't needed by MariaDB installed by the TKL template. Having said that the MariaDB documentation still recommends people to run it immediately after setting up a new server, so it's not a dead script yet.

I've just set up a new test MariaDB server and have confirmed for myself that the effect of mariadb_secure_installation is pre-installed and if I run the script then it lets me change the password and keep the unix socket authentication plugin without breaking TKLBAM. There's no prompt about socket authentication that I've seen though.

I think what's happened in my case is that the first app I migrated to my new TKL MariaDB server was quite old and was still using the root account to access the database (it came from a server that had both MySQL and Apache on it so it was local root access). It wouldn't work and then I spotted the plugin for the root account being set to 'unix_socket'. I've changed that to 'mysql_native_password' to confirm the app was working before I changed the authentication to a more suitable account. I've subsequently realised that the root account didn't have remote access anyway, courtesy of mariadb_secure_install, but forgotten to switch the plugin back to 'unix_socket' and that's what's breaking TLKBAM-BACKUP. It was a few weeks after I set the server up that I experimented with TKLBAM and I didn't associate the previous changes with the problems I was having. I've not switched the affected server back to unix_socket yet though. I need to wait for a quiet time just in case I get it wrong and lock my apps out again!

I'll try to have a look into the code some time soon and see what I can see. In the meantime, if you can confirm that the DB is being stopped, then I'll definitely have a closer look and see if we can improve things...

TKLBAM doesn't stop the database, although in my case it was crashing it which is sort of the same thing! This was from your first post of the thread and lead to my pleas not to do it that way!

But perhaps we can work out some other method (e.g. TKLBAM could stop the MySQL/MariaDB service and temporarily start an unauthenticated process just to dump the DBs)?

packman2008 commented 2 years ago

Just to confirm - I changed the mysql root user account to set plugin="unix_socket" and authentication_string="" and tklbam-backup now runs without stalling the database.

Thanks for pointing me in the right direction.

JedMeister commented 2 years ago

TKLBAM doesn't stop the database, although in my case it was crashing it which is sort of the same thing! This was from your first post of the thread and lead to my pleas not to do it that way!

But perhaps we can work out some other method (e.g. TKLBAM could stop the MySQL/MariaDB service and temporarily start an unauthenticated process just to dump the DBs)?

Yep, crashing the DB pretty much counts as stopping it! :smile:

FWIW, the context was more suggesting a possibility that would allow TKLBAM to still work, rather than crashing your app. The other option is to just fail gracefully with an explicit error message (it definitely shouldn't crash your DB regardless of whether your backup succeeds or not!). So the suggestions was as a possible fallback, rather than a recommendation to make that the default method.

I was leaning towards allowing it to "just work" (i.e. stopping it) whilst also showing a note in the log so it would be easy to track down why it was stopping and restarting the DB. But following this discussion, I'm now leaning more towards just failing gracefully.

I guess another option would be to not fail at all and just warn (in the output and/or logs) that the DB wasn't included in the backup. This is my least favourite option because despite our ongoing encouragement to test backups before relying on them, we still occasionally get users complaining that some (no-standard filesystem location) isn't in their backup when they need it... I'd hate one of our users lose all their DB data because of an oversight like that. It could still happen if it errors (and then they would have no backup at all if they missed the failures/error messages), but it's less likely in my experience.

I'd be really interested in your thoughts if you'd care to share.

Just to confirm - I changed the mysql root user account to set plugin="unix_socket" and authentication_string="" and tklbam-backup now runs without stalling the database.

Thanks for pointing me in the right direction.

Awesome! :smile: You're most welcome.

Please do drop in anytime (here in our issue tracker - for specific bugs and/or feature requests) or our forums for support, more vague or less specific ideas ad feedback and/or general discussions. (Note that to post a new thread on the forums, a free website user account is required).

And finally, if you (and/or anyone else reading this) is a paid Hub user, then you can access direct support. Do that either via email (support@turnkeylinux.org - be sure to post from the email linked to your Hub account, or at least note the email of your account). Or you can use the Hub's in-app "chat" (feedback link on top menu bar, or blue chat icon in bottom right). If I'm not online when you post, I'll respond ASAP - generally within one work day, often quicker. If you're not logged in when I respond, you'll get an email response.

packman2008 commented 2 years ago

I'd be really interested in your thoughts if you'd care to share.

I'd go for fail gracefully and definitely not-stopping and restarting the database to allow the backup to proceed. That could be very annoying. For example, I have one data update job that downloads GB's of data and processes it into a database 4 times per day. A run can take 2-3 hours at times. Stopping the database near the end of run would be very annoying. The processing can't easily be re-started from the point of failure (at least not without a lot of re-engineering) so it would mean having to run the whole thing again causing a longer than normal delay until the data can be released to the end users.

I think it might would confuse me. For some reason my database likes to say that every shutdown was initiated by 'unknown' and if it appeared to be stopping for no known reason I'm not sure I'd go looking in backup logs to find the answer. Even if it told me the user that had initiated the stop I'd probably be no clearer because tklbam runs under root just like most other system processes.

I tend to be unsympathetic to people with no/failed backups. I used to make it very clear that it's your responsibility to make sure you do a backup, your responsibility to make sure it runs successfully and your responsibility to make sure a test restore works from time to time. So graceful failure or no failure but give a warning in the logs work for my way. But then if you talked to some of the people I've worked with over the years they might say I was related to BOFH ;-)

JedMeister commented 2 years ago

I'd go for fail gracefully and definitely not-stopping and restarting the database to allow the backup to proceed. That could be very annoying.

Awesome. Thank you. That's exactly what I was after. A firm opinion with a real world rationale! My favourite! :smile:

I think it might would confuse me. For some reason my database likes to say that every shutdown was initiated by 'unknown' and if it appeared to be stopping for no known reason I'm not sure I'd go looking in backup logs to find the answer. Even if it told me the user that had initiated the stop I'd probably be no clearer because tklbam runs under root just like most other system processes.

Yes. Great point! Looking from that perspective, it's actually a bit of a no brainer...

I tend to be unsympathetic to people with no/failed backups. I used to make it very clear that it's your responsibility to make sure you do a backup, your responsibility to make sure it runs successfully and your responsibility to make sure a test restore works from time to time. So graceful failure or no failure but give a warning in the logs work for my way. But then if you talked to some of the people I've worked with over the years they might say I was related to BOFH ;-)

Haha. Yeah, I think I'm possibly way too empathetic sometimes.

Thanks again for your feedback. Really appreciated.