Authenticator fails to login into the system with disk-full condition

ThomasHabets commented 8 years ago

From @ThomasHabets on October 10, 2014 8:7

Original issue 391 created by yurivict on 2014-06-13T22:40:11.000Z:

When the remote host has the disk-full condition, google-authenticator makes it impossible to fix it remotely, because the login always fails with this message in auth.log: Jun 13 15:30:38 eagle sshd(pam_google_authenticator)[82081]: Failed to update secret file "/home/yuri/.google_authenticator"

This is a very serious problem for an administrator if such situation happens. It locks the remote administrator out.

Copied from original issue: google/google-authenticator#390

ThomasHabets commented 8 years ago

I've encountered this too, and it sucks.

Not sure what the solution should be though. Failing closed is the safer option. Not updating the state would reduce security.

humppafreak-zz commented 7 years ago

Ran into the same problem last night: ssh login failed due to google-authenticator not being able to update the secret file on a read-only volume.

My suggestion would be to

leave the decision of allowing logins to a r/o system up to the admin by introducing an option for the pam config (like "auth required pam_google_authenticator.so allow_readonly")
give some sort of indication after entering a valid token that the underlying volume is read-only (and that that's the reason for failing if allow_readonly isn't set) and maybe
allow the emergency scratch codes from the root user's secret file to override the lockdown.

ThomasHabets commented 7 years ago

This means allowing the user to out of convenience essentially disable 2FA. Nope. If you want that then you'll need to write your own pam_out_of_disk and configure PAM to skip pam_google_authenticator when disk is full.

2.. Possibly acceptable.

This would allow reuse of emergency code, so I think best not to.

humppafreak-zz commented 7 years ago

Excuse my cluelessness, but the only issue I see with option 1 is the possible circumvention of the DISALLOW_REUSE statement by not writing timestamps of used tokens into the secret file, right? This is already possible by running a simple sed one-liner to delete the timestamps after each login, isn't it?

ThomasHabets commented 7 years ago

Ah yes, I was thinking HOTP, which this would break much much more.

DISALLOW_REUSE is still pretty important though, and seeing as how "fill up someone disk" is not hard (locally and usually remotely too), it would probably be a surprise if allow_readonly effectively disables disallow_reuse, since like I said I expect people to add allow_readonly as "make it work".

humppafreak-zz commented 7 years ago

Okay, I hadn't thought of HOTP tokens.. yes, in that case allowing logins without being able to update the last used token would be breaking things in a pretty stupid way.

For TOTP though, if allow_readonly were an argument on the pam config line, the scope of people who could "add it to 'make it work' " would be restricted to the same set of people who could simply delete the module entry from the pam auth stack entirely, i.e. users w/ root access.

If the side-effects of allow_readonly were well-documented (like they are for all other config options and the format of the secret file), people would hopefully think twice and keep a good eye on their disk usage.. Still, there should be a way around this whole problem: being able to remotely lock out legitimate non-root users by simply "filling up someone's disk" is a recipe for disaster.

PS: I explicitly meant "read-only filesystem", not "full disk" in my original comment. I just verified this in a throwaway VM: if I completely fill the disk as a user, I'm still able to log in as root, even with 2FA, because root's attempt to write to /root/.google-authenticator is covered by the reserved blocks for the underlying FS. Then again, if anyone should be allowed to log in as root at all is a completely different topic.

ThomasHabets commented 7 years ago

I don't think documentation is a good solution for "if you do this then you shoot yourself in the security foot". With security stuff it's most often better to not have 2FA (and know that you don't) than to set it up poorly.

Interesting point about r/o FS. Say local logins in rescue mode.

Hmm… maybe I'd accept a pull request after all. Only working on TOTP, and only accepting this for EROFS and/or ENOSPC.

I spun off the error message into #57

brendanheywood commented 4 years ago

Just throwing out an idea: as far as I can tell 2 things need to be written when you login:

XXXXXX...
" RATE_LIMIT 3 30 [timestamp]
" DISALLOW_REUSE [code1] [code2]
" TOTP_AUTH
...
...

The first is the rate limit timestamp. The second is the list of disallow reuse codes. The disallow codes are functionally equivalent to the timestamp which mapped to those codes. Additionally you only effectively need the most recent reuse code if the clock skew is limited to 1 time period. This means that all of the parts which are changing can be boiled down to a single timestamp. So would it be possible to not actually write to the file but just update its last modified time instead? I'm assuming / hoping this actually works if the disk is full.

If you genuinely needed to write the file ie to update the backup codes or to use one then I think that's an acceptable failure option, as it is now. But for a normal totp login I can't see a reason for the file contents to need to change.

The only edge case loss of function I can see is if the allowed time skew periods is large, lets say its 5 periods, and you login multiple 3 times validly over those 5 periods. So currently you would have 3 disallow reuse codes, but in the new system you'd just have 1 timestamp, so it would look like all 5 time periods have been used rather than only 3. I think this is a minuscule loss of function and arguably an increase in security, and either way it's moot for the purposes of this issue because you've been able to login anyway. If you had one of other the 2 slightly older codes, which previously would have worked but which now don't, then you just have to wait another 30 seconds and grab a new code and it will be fine again.

arjenlentz commented 4 years ago

nifty, @brendanheywood ! If others think this is a good solution for TOTP, I'd be happy to look at coding that up for a PR.

ThomasHabets commented 4 years ago

Doesn't seem like a good idea to me. Effectively using metadata as data seems wrong. It would mean strange things like touch ~/.google_authenticator cause a failure to log in for a time window.

I'm also not convinced that filesystems have a promise, or a behaviour, where rewriting the inode is done in place. And it may change over time.

Using inode data to store application-level data could create surprises that are hard to foresee. E.g. if someone sets up some sort of fuse filesystem that doesn't have timestamps.

graytron commented 2 years ago

Throwing another idea: Add a volatile=/another/path/to/secret/file option to pam_google_authenticator.so.

With this option:

Initially make google authenticator to copy the secret file (~/.google_authenticator) to the given volatile location (f.ex. to a tmpfs file system), if the file doesn't already exist.
Instead of only trying to update the secret file in one location, google authenticator would also update the secret file in another location. Authentication shouldn't fail if writing to the non-volatile location fails.
Reading of the secret file would only be done from the volatile location, if the file exists.

This would solve login issues on read-only file systems and for those cases where the disk is full, unless the disk is the ram disk.

ThomasHabets commented 2 years ago

@graytron Presumably the copy would only happen if the file didn't already exist in the volatile location?

But this solution means that emergency codes may be reusable e.g. across reboots? Also it just moves the problem. What do you do if updating the volatile location fails? This seems to add complexity with hard to reason about implications, which means I don't see it being possible to use as a default.

Why not just create a dedicated file system for the secret, if we're so worried about running out of disk space? Or a failsafe, e.g. console login.

I don't think this problem is big enough to warrant opening up another problem, even if it's small.

graytron commented 2 years ago

@ThomasHabets Yes, only copy the file if a file doesn't already exist in the volatile location. If reading or updating the volatile location fails, then login should fail as well. Also, volatile= option should not be used as the default. Emergency codes would indeed be reusable across reboots in case writing to the secret= file fails. Yet, I would prefer to be able to choose between not being able to log in at all, and between reusing any possibly used emergency codes.

I currently use google-authenticator on some read-only file systems. On these systems, on each boot, I need to mount tmpfs in ~/.google_authenticator, and I also need to copy the static ~/.google_authenticator_static file to this tmpfs file system. In /etc/pam.d/sshd I use secret=/home/${USER}/.google_authenticator/google_authenticator option. All this just to be able to login using 2FA.

On second thought, maybe we could have a google-authenticator daemon, which would start up and load the secret file on boot. It would then update the secret= file when necessary, while keeping the same data in RAM, while also having an option for the login to NOT fail when writing to secret= file fails.

horschi commented 2 years ago

@ThomasHabets : Will it still update the file if I disable rate-limiting and dont use a scratch code? Then there should be no need to update the config file and logging in with a full disk should be fine. Will it behave in such a way?

ThomasHabets commented 2 years ago

@horschi if there are no updates then no write attempt should happen.

@graytron a daemon sounds like an interesting solution, but one out of scope for this project. It's standard TOTP/HOTP though, so I'd encourage you to write one. Probably you'll end up with cleaner code than we have here. :-)

horschi commented 2 years ago

@ThomasHabets : It indeed seems to not update the file in such cases. I for sure will deactivate rate-limiting. Lock me out once, shame on google-authenticator, lock me out twice shame on me ;-) Thanks for your feedback!

ryanjameskim commented 2 years ago

I ran into this problem as well. What is the safest workaround? Is there a way to reserve space on a partition for this write information?

ThomasHabets commented 2 years ago

It's not really possible in the general case. Deleted space and even a pre-"reserved" big enough file for overwriting will not be reliable as filesystems use journals and copy-on-write.

jackmawer commented 1 year ago

This is still a serious issue in 2023. I think I agree with the general consensus that the option of slightly reduced security as a workaround for this situation is much preferable to being locked out with no recourse, as long as the documentation for said option makes the consequences very clear.

ThomasHabets commented 1 year ago

Consensus or not, I will at least not approve a PR that implements a foot-gun for one purpose, that basically negates all security of the OTP. Unless it can be done in a way that I can't break it.

FYI when I accidentally locked myself out due to clock skew I fixed it the lazy way using an expect script. I had no rate limiting, so OpenSSH limited me to 10 per second. Without even a skew window this allows a brute force of about half a day. If I count right it's about 1M codes / 10 / 2 / 3600 = 14h on average. (actually I'm not a statistician, and the code changes, so that calculation is not quite right)

IIRC it took me about half a day to hack into my own computer.

If I can do the same, only with the small extra step of filling up the disk first (maybe I'm even a local user), then what's the point of second factor?

Maybe we could allow it with an 8 digit scratch code, but then again what's "OTP" about it?

But that's just my opinion.

google / google-authenticator-libpam

Authenticator fails to login into the system with disk-full condition #13