opensearch-project / security

🔐 Secure your cluster with TLS, numerous authentication backends, data masking, audit logging as well as role-based access control on indices, documents, and fields
https://opensearch.org/docs/latest/security-plugin/index/
Apache License 2.0
199 stars 277 forks source link

[RFC] Implement one-time passwords for OpenSearch Security #3288

Closed stephen-crawford closed 1 year ago

stephen-crawford commented 1 year ago

This issue is a request for comment on the idea of implementing one-time passwords for OpenSearch Security. One-time passwords (OTP) are a commonly used mechanism for providing additional security to web applications. You read more about them here: https://en.wikipedia.org/wiki/One-time_password.

As part of the researching alternatives to the default credentials admin:admin, I looked into what it would take for OpenSearch to implement its own OTP mechanism. This mechanism would be offered by the Security Plugin as an additional feature for users with the added benefits of:

The most notable benefit of OTP support is the last listed. By supporting hash-based one-time passwords (HOTP), we can guarantee that only a single user is able to login as the admin. HOTP act similar to time-base OTP but instead of using an expiration window, use a incrementing counter which denotes the specific token generated for login at any time. That is, HOTP has a counter where every time a new OTP is generated, the counter increases and the only OTP which is valid is the one whose encoding not only matches the OTP expected but has the correct counter value. Using HOTP, we can then confirm that the OTP provided is only the most recent one generated, and also invalidate its use by incrementing a second counter.

For a simplified example, imagine we have two counters inside the security plugin where it is creating the HOTP:

token_issued = 0
token_logins = 0

Now when the admin requests an OTP, we go through the process of generating the OTP with the secret and the current value of the tokens_issued counter. After creating the token, we increment the counter:

token_isssued=1
token_logins=0

Then when the admin gets the token and attempts to use it to login, we validate the token both based on the secret, but also on the issued counter associated with its encoding. If the token_issued counter is not 1 greater than the token_logins, we reject the request. After a successful login we increment the login counter:

token_isssued=1
token_logins=1

This process can be repeated.

Looking at this setup, we can see how HOTP provides a guarantee that a OTP can only be used to login a single time with the cluster. That means that unlike a time-base OTP, we do not need to worry about a token being used by multiple parties to access the admin account. Instead, we have a guarantee that a HOTP is only usable a single time, and assuming that the admin is the person receiving the OTP, that the user logging in is in fact the admin user.

This in turn allows us to treat the admin user as a special user even without using the admin certificate. Normally, we need to make use of the admin certificate, because we have no way of determining whether a user logging into the cluster with the admin account is actually the administrator. The admin account not only has a widely known default credential but lacks any measures which would provide authenticity to a login. We don't really know whether a user logging in with admin account is the true cluster administrator whom should be granted the control offered by the admin certificate. Instead, we have to rely on the admin certificate to act as a verification method that signals to the cluster that the admin user is truly the admin of the cluster.

With this implementation of HOTP we circumvent this issue. Assuming that the cluster operator is the intended "true admin," we know that the person who set up the HOTP will be the intended admin. The details of setting up the HOTP system can be specified later but will generally require providing something like the users email or connection to a MFA provider such as DUO. Regardless, because they launch the cluster with this configuration, we can then be certain that--ignoring cases where their email or MFA app is compromised--we always have a way to reach them, and them alone.

Then using the HOTP we pass them, they are able to access the cluster act as the admin. Because, the HOTP is only good for a single login, we also do not need to worry about attempts to use the password a second time and impersonate the admin. Instead, an attempt at impersonating the admin would require the fraudulent admin to get access to the HOTP and make use of it to login before the true administrator. This responsibility should be left to the administrator since we have no way of enforcing proper use practices.

Ultimately, we a HOTP system, we can be sure that the admin user is really the cluster operator/the person we would expect to have the admin certificate. This would let us provide a setting where actions normally requiring the admin certificate can be permitted on the basis of HOTP use with the admin account. This is particularly valuable for cases where users want to complete operations requiring the admin cert but either operate behind an SSL terminator or do not wish to have SSL enabled.

A diagram of the synchronicity of the nodes in a cluster can be found here: https://github.com/OpenSearch-Security/OpenSearch-Security/issues/3#issuecomment-1695904740.

peternied commented 1 year ago

While I agree one time passwords could be very useful for many scenarios around securing OpenSearch, I don't think its viable as a replacement for super user credentials or bootstrap accounts because of how those accounts are used in practice. When a cluster is started up for the first time, there is no external services that OpenSearch is authenticated against. so how do you get the OTP to the user that is going to use it if those external services are misconfigured or unavailable?

I think in the context of manage services where there are more guarantees about the environment and configuration OTP could be valuable. Maybe the existing authentication interfaces could be used for OTP type providers, or maybe we could expand on them to support the scenario.

Maybe if you were to restate the problem in terms of what isn't working for OpenSearch customer scenarios it would be clearer why to prioritize this area, what do you think?

stephen-crawford commented 1 year ago

Hi @peternied, thanks for taking a look and leaving some feedback.

For your first point, I am not sure I totally follow. I don't think we should need OpenSearch to authenticate against anything in order to execute a OTP flow. To my understanding, we would use the implemented library to create a secret, process everything like normal, and then the admin's request for a OTP would trigger a curl request https://auth0.com/docs/secure/multi-factor-authentication/authenticate-using-ropg-flow-with-mfa/enroll-and-challenge-otp-authenticators. We can then use the secret provided and have them generate a code from the implementing sign in tool which they then provide to login. I also don't think that we should use misconfiguration or external services being down as a reason to not do something. It may be an argument to offer multiple mechanisms (mind you OTP would not be required) but we cannot control whether someone has entered in the wrong information etc.

For you second point, unfortunately, I don't have access to that type of information and it probably would not be appropriate to list specifics even if I did. There are some common cases (some of which you told me about :)) where this could come in useful. First, any cluster which makes use of SSL offloading/terminating would have a use for not needing the admin certificate: https://avinetworks.com/glossary/ssl-offload/. Second, generally speaking, OTP is a far superior default authentication implementation to hard-coded default credentials. We are offering the ability to provide a default password with the changes to read from the config files or env. variables, but both of these options are still susceptible to the many flaws of human-selected passwords. Having a user-specified password is better than using admin:admin but admin:<my_daughters_birthday> etc. is still more guessable then admin:<random hash>. Finally, I think that we can offer this as a potential stepping stone towards some user federation support. Admittedly, I am unclear on the implementation details, but consider a case where we not only support OTP but also use of OTP to map internal accounts to external accounts:

  1. Imagine you are an admin and have configured your admin OpenSearch account as you like. This includes setting up OTP so you can get a password that OpenSearch recognizes as unique to you logging in as the admin in this current attempt.
  2. You also have an external SAML account that you are able to attach generics to. Let's say there is field attached to the account that says "associative_token" that you are able to provide a string for and that OpenSearch security is able to read.
  3. You can then take the OTP key generated by the login request and put that in the "associative_token" attribute of the account.
  4. You then return to OpenSearch and login with the SAML account SAML_123
  5. OpenSearch sees the "associative_token" field is populated with a value and parses it. It then checks that entry against the outstanding OTP registry to see if it matches any of the live OTP requests.
  6. Seeing a match against the user admin, it is able to associate the SAML_123 user with the admin internal user since it knows that the SAML_123 user must be the same person who requested the OTP for the admin login.

Again there are a lot of intricacies to this that would need to be ironed out but I think this is another valuable feature that OTP could help address.

Let me know if this answers any of your questions or you have further comments. If you don't think this is worthwhile I will close the issue but figured I would bring it up.

stephen-crawford commented 1 year ago

[Triage] This is an RFC and not currently actionable. Leaving without the triage label. This should be closed in 2 weeks if there is no progress. So close on 9/28.