ory / kratos

The most scalable and customizable identity server on the market. Replace your Homegrown, Auth0, Okta, Firebase with better UX and DX. Has all the tablestakes: Passkeys, Social Sign In, Multi-Factor Auth, SMS, SAML, TOTP, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=kratos
Apache License 2.0
11.19k stars 959 forks source link

Design Document for SAML2 Implementation as a Service Provider #1928

Closed ThibHrrd closed 1 year ago

ThibHrrd commented 2 years ago

Preflight checklist

Context and scope

The current options proposed by Kratos to authenticate a user are :

Many companies want to be able to use Single Sign On (SSO) to be able to use their own employee database as their single source of truth for authentication of users across all the services they use. This allows to easily manage newcomers in the company and departures as it instantly registers new users on every services used by the company, and remove them when they depart.

A protocol commonly used to provide this feature is SAML2. Currently Kratos does not provide any way to use SAML2 as a way to authenticate a user. Many companies already have a SAML2 ecosystem, and cannot transition easily to OpenID Connect as an alternative protocol of SSO.

Goals and non-goals

In SAML2, there are 2 actors :

The Identity Provider is the single source of truth for the identities that can able to connect. The Service Provider is any service that wishes to log in using the Identity Provider as their source of user. In an ecosystem using SAML, there is usually 1 IdP and several SPs that turn to the IdP to know which users can login.

The goal is to implement the SAML2 protocol in Kratos as a Service Provider that could be connected to an external Identity Provider. Thanks to the SAML implementation, users who already have a SAML2 ecosystem will be able to use Kratos to provide a consistent login protected session to an application. Users who already have an Identity Provider (like Active Directory) will only have to plug their Identity Provider to Kratos.

However the goal is absolutely not to be able to use Kratos as an Identity Provider. We assume that the users of this feature already use an external IDP.

The design

The go library crewjam/saml handles most of the complexity of the SAML protocol. Using this library adds an external dependency, but SAML is a complex protocol that is a project on its own, and is not that simple to implement in its entirety.

In this library, routes that needs to be protected call the RequireAccount method. When using this method, a SAML request is sent to the IdP to ensure that the user is logged in with the IdP (if he's not, it starts a login flow on the IdP). Once this call is completed, the login flow is completed on the SP side (Kratos here), and the user has 2 sessions : one with the SP, and one with the IdP.

There are 2 possibilities regarding when to use the RequireAccount method:

Option 1 has the maximum level of security with 2 sessions that are being checked every time a protected resource is accessed. It ensures that the Kratos session is always up to date with the session on the IdP. It also comes with a performance overhead, since there are now twice the session checks to do. Option 2 sacrifices a bit of security for better performance. Not making the check on every resource allows a mismatch in sessions, for example if the user has its access to the IdP revoked, he would still be able to access content protected by Kratos as long as his Kratos session does not expire.

One part that is not handled by the crewjam/saml library is the configuration of the metadata file. For SAML to work, the service provider (here Kratos) must provide an XML file which contains all the attributes that the SP need the IdP to provide when logging a user in (such as username, email, name, first name etc.). On the Kratos side, all these attributes correspond to the Identity Schema. That means that there needs to be way to automatically generate this XML SAML metadata from the identity schemas configured in Kratos.

When logging out of the IdP, a logout SAML assertion can be sent by the IdP. This needs to trigger and complete the log out flow on Kratos

About provisioning: When the SP has identity data that is different from what is received in the SAML assertion (eg. first login on the SP, name changed), the SP data needs to be updated. This is called just-in-time (JIT) provisioning. On the Kratos side, this means that whenever a SAML assertion is received, the attributes received need to be compared to what's in Kratos' database.

If there is a difference, it needs to update the DB to be in sync with the data given by the IdP. This synchronization from the IdP raises the question of what to do when a user updates their settings. If nothing is done, the settings will be modified in the Kratos DB. Next time a SAML assertion, there will be a difference with the DB, since settings were just changed, and the JIT provisioning mechanism will revert the change that was done.

Just-in-time provisioning also require the previous RequireAccount call to be called every time a page with personal data is accessed (which can be any page protected by Kratos) in order to be sure that this data is up to date with the IdP. This means that option 1 mentioned earlier has to be selected when enforcing the session check.

To transport messages between the SP and the IDP, SAML supports 3 types of bindings :

The library that we want to use only supports 2 bindings:

These two protocols are the most used so the fact that the library only supports these two is not really constraining.

APIs

In our case, we would have to add to Kratos everything needed to act as a SAML Service Provider by adding all the necessary endpoints and associated handlers :

The crewjam SAML library in Go will handle these routes.

Data storage

The SAML implementation will NOT add new database or tables. It will just use the existing data models and make sure to synchronize users from the IDP database with those of the Kratos database when adequate.

One important thing to note is that both the SP and the IDP have a database with the same users but with (probably) a different nomenclature. This also means that there is no obligation to have the two databases in the exact same state regarding the users present in them, and the attributes associated with them.

Degree of constraint

Concerning the constraints, the first one is obviously SAML. It is a very complex protocol that is not really maintained anymore. Of course, it has many advantages but also some disadvantages. We must be sure to master this protocol to avoid any security problems.

Then, there is the constraint concerning the crewjam/saml library. At the time where this document is being written, the contribution on this project are infrequent. We don't know if the library is very secure and if all known exploit are properly patched (like XXE exploit for example). Before using it, we must ensure that the library does not include CVEs or any other exploit.

Alternatives considered

A very important part of SSO is provisioning. This topic deals with how the SP's database should (or should not) match the IDP's database.

Provisioning is not obligatory, it is entirely possible to accept that there is a difference between the two database.

In our case, we want to implement this provisioning. To do this, there are two possible options :

Example of Just-in-Time provisioning : A user logs into the Identity Provider of his company and wants to access a service provider for the first time. At this moment, the user exists in the IDP database but not yet in the SP database. The IDP then sends a SAML Response to the SP. The SP receives this response and notes that the user indicated is not present in its database. It creates the associated account automatically.

Example of Real-Time provisioning : There is an external system that is constantly syncing the databases of the IdP with the SP. When a user is added to the IdP's database, it is also added the the SP's database. When the user logs in for the first time on the SP, they are already in the SP's database, and no particular action is required.

The real time provisioning require the usage of an external protocol/application. So we don't think to use this method because it is out-of-scope.

We are thinking about using the JITP (Just in time provisioning). This implies that for each message sent by the IDP, Kratos has to check that the user is present in its DB and if the information sent matches. If there is no match, the Kratos DB has to be updated accordingly.

psauvage0 commented 2 years ago

This is a first draft, if there is something is unclear don't hesitate to point it out

psauvage0 commented 2 years ago

Also there was already an issue (#275) about implementing SAML authentication, but we had to create a new one to use the design document form. Maybe we should copy paste and close, or mark one as duplicate ?

aeneasr commented 2 years ago

I did not have time yet to look over this, I am sorry. This week might also be difficult for me, as it definitely takes some thinking time. I will try my best though and encourage everyone interested to participate :)

ThibHrrd commented 2 years ago

No problem, feel free to look when you have time. I understand that it it can take a little time to understand because of the comlexity of saml :)

aeneasr commented 2 years ago

Thank you! The proposal sounds really good and is well explained. Regarding sync, maybe we could use JsonNet as we do with OIDC to select which attributes to update?

ThibHrrd commented 2 years ago

Thanks a lot! And concerning attributes, we had looked at how the attributes were handled with OIDC and thought of using the same system so it's perfect.

Thanks for your feedback :)

aeneasr commented 2 years ago

Cool! In that case, I think this proposal is pretty much complete! I think JITP is the right approach! I can also imagine to disable certain update types (e.g. updating traits) or profile updates in general if a SAML connection is established for an identity.

Does it still make sense to have 2fa or is that solved at the SAML IDP?

psauvage0 commented 2 years ago

Indeed, disabling some updates is a good idea since allowing to change settings that are synced by the IdP would result in those changes being reset as soon as the next SAML assertion is received, which makes for a poor user experience.

I'm not sure what you mean with the 2fa, do you mean disabling kratos's 2fa when using SAML as a connection method ?

aeneasr commented 2 years ago

I'm not sure what you mean with the 2fa, do you mean disabling kratos's 2fa when using SAML as a connection method ?

Yes, that was my question :) So does SAML address 2FA, or would we still offer 2FA for SAML users?

psauvage0 commented 2 years ago

I don't think there is a reason to prevent doing 2FA with SAML. 2FA could be solved on the IdP or on the SP side (or even both if you really wanted, even though I doubt that would actually increase security). Although it's an unusual use case, you could set it up so that you would need to log in with the IdP, then provide 2FA on the SP side. This allows to have some SP that require higher AAL than other SP. For example, it seems Github allows that : https://docs.github.com/en/organizations/granting-access-to-your-organization-with-saml-single-sign-on/about-two-factor-authentication-and-saml-single-sign-on

So I would say that there is no need to disable 2FA with SAML, unless it turns out that it introduces complications that are too much to handle, given how unusual the use case is.

kszafran commented 2 years ago

I believe that's also how the current OIDC implementation works. You can set up both OIDC and a second factor (e.g. TOTP) and when you log in, you'll first go through the OIDC flow, and then provide e.g. a TOTP code to get to AAL2. So unless SAML is fundamentally different, I think it should work the same.

ThibHrrd commented 2 years ago

Our technical approach

As you may have seen in this Design Document, I would like to implement SAML in Ory Kratos. We have to approval of @aeneasr so I will try to explain our approach in a more technical way. The purpose of this message is to present our idea in a more technical way and propose to interested people to help us develop all this.

First of all, we have no knowledge of the security of the SAML library we want to use. It is more than necessary to perform a security audit on this library to validate its security.

Completion Progress

We thought about 4 main tasks :

First Part

Concerning the first part, the goal is to develop the two main endpoints :

Second Part

The second part will deal with the way endpoints work. The library already implements what we want to make these endpoints work. The library allows you to create a metadata file very easily so we will need to incorporate it into Kratos to allow the endpoint /metadata to create them easily. Concerning the endpoint /acs, the Crewjam library allows to receive the SAML requests, to understand them and to treat them accordingly.

Third Part

Now that the endpoints are created, the SAML requests must be processed by Kratos. This means that the endpoint /acs must receive the SAML requests, understand them and translate them into a language that Kratos can understand. More clearly, this endpoint must allow Kratos to support SAML requests and to perform the actions associated with these requests.

It is also in this part that you must check if the session has not expired (according to the duration indicated in option). If it is the case, you have to send a SAML Request to the IDP.

Fourth part

Finally, the last part will concern the configuration. Not everyone wants to use SAML so we will have to use the YAML and Kratos configuration system to adapt it to SAML by adding new options to indicate if we want to use SAML and fill in the endpoints. The objective here is to make the final link between Kratos and SAML and thus be able to create instances of Kratos implementing SAML.

Concerning the options, here are the variables we can modify :

Conclusion

This is only a first draft and is not intended to represent the final work. It allows you to see the surface of the workload and the action plan.

Do not hesitate to tell me if you are interested in the project and if you want to contribute. All comments are welcome!

akshay196 commented 2 years ago

Hi team, I would like to work on this. Let me know in case someone already picked this up.

I have recently worked on a SAML implementation for one of my project and I used crewjam/saml library for that. The project is still in progress but I have tested basic SAML functionality and it is working fine. Thanks to @ThXb35 for the initial design which helped me to understand SAML and how we should go ahead for implementation. :)

I am new to Ory Kratos, so I will begin with Kratos doc and contribution guide. Let me know if there something else that I should read before I start.

ThibHrrd commented 2 years ago

Hello @akshay196 ! I'm glad you're interested in this project! I've already made a good progress on it but I'm a beginner in Golang and SAML so some help would be welcome.

You can find my progress here https://github.com/ThXb35/kratos/tree/saml

Don't hesitate to contact me, it would be a pleasure to work together on it!

akshay196 commented 2 years ago

@ThXb35 Okay. I will look into your code. Feel free to open draft PR once you are ready, we can collaborate there if anything is needed.

ThibHrrd commented 2 years ago

This is it @akshay196

https://github.com/ory/kratos/pull/2148

ThibHrrd commented 2 years ago

Hi @akshay196, if you want to help, I am currently trying to create a method to generate session from SAML Assertion by using Crewjam library. Don't hesitate to tell me if you want us to work on it!

akshay196 commented 2 years ago

if you want to help, I am currently trying to create a method to generate session from SAML Assertion by using Crewjam library. Don't hesitate to tell me if you want us to work on it!

@ThXb35 Sure. We are trying to solve There is also the very important problem of converting the session created by the Crewjam/saml library into a Kratos session to remain homogeneous. from Third part of your completion progress, right? I am just trying to understand where we are.

Crewjam/saml library's samlsp.SessionProvider interface is used inside samlsp.Middleware struct. So we can write our own KratosSessionProvider which satisfies samlsp.SessionProvider interface and then let's use it in instantiating Middleware. Hope it works? Any thoughts?

I still need to understand kratos/session, how it works, what it includes etc. I will take a look at it, if we are fine with above approach.

ThibHrrd commented 2 years ago

@akshay196 Yes exactly, we are trying to modify a little bit the use of the library to create a Kratos session after receiving a SAML assertion.

I'll explain you a little bit my idea : currently, in crewjam/saml, we have this method which receives a SAML assertion and create a session : https://github.com/crewjam/saml/blob/main/samlsp/middleware.go#L185. This session works with a JWT and therefore not recognized by Kratos.

The goal would be to modify it to allow to create a Kratos session.

I already had a small discussion with the developer of Ory on slack: https://ory-community.slack.com/archives/C012USDT5QQ/p1642431769008200 What he says to me is that one can be inspired by the code of creation of session of OIDC which already exists in the code because the operation approaches SAML.

It will therefore probably be necessary to create three new classes: strategy_login.go strategy_register.go strategy.go like what already exists here for OIDC: https://github.com/ory/kratos/tree/master/selfservice/strategy/oidc.

I like your idea of writing our own KratosSessionProvider that satisfies the samlsp.SessionProvider interface but I'm afraid it's more complicated to integrate

I hope my explanation is clear! :)

siennathesane commented 2 years ago

I just wanted to touch base and see how this is going :)

ThibHrrd commented 2 years ago

@mxplusb Everything is going well for the moment, we have a rather functional version and we are currently writing unit tests :)

kopancek commented 2 years ago

Hi, I see that there is an open PR on a fork ovh/kratos#3. How far is it from being merged upstream? Currently considering using ory/kratos, but this is a blocker feature for us.

psauvage0 commented 2 years ago

@kopancek We're currently waiting for a review from the Ory team on #2148. Unfortunately the PR is rather large so they may take some time to review. In the mean time, if you need it really fast, the fork is usable on the branch saml (https://github.com/ovh/kratos/tree/saml). Although you might need to dig around in the code to figure out how to set up the SAML federation.

mmellison commented 2 years ago

This is awesome work and my team is really excited for this, as it was one of the main challenges with implementing a new auth service based on Kratos. We rely heavily on SAML for our enterprise customers of our platform.

I think a crucial missing part of this design is for the multi-tenant/enterprise use-case where the authentication stack (in this case Kratos) supports dynamic IdP configuration for a sub-set of users based on affiliation. This is typically done by looking at the user's email address domain.

For example, users under the @companya.com domain name will be sent to the IdP for Company A while users with @companyb.com will be sent to the IdP for Company B. Ideally, administrators of Company A and Company B would also be able to manage SAML configuration (perhaps through the admin API) without having to update any YAML configuration files on the backend.

Curious if this has been thought about / accounted for in any follow up design or work? If not, I'm happy to look into having my team contribute here. We've been enjoying working with the Ory stack!

psauvage0 commented 2 years ago

@seglberg Currently, in what we're developing, we can only federate to one IdP. Adding the possibility to configure multiple IdP should not be too hard.

However there is a significant problem with that : Kratos does not and will not handle software multi-tenancy. This is a choice from the Ory team and they do not seem to want to change that (seeing how complicated it would be to add this feature, I understand them). This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

With that limitation in mind, I don't know if the Ory team would approve of this. @aeneasr should be able to get back to you on that.

Regardless of that, the PR to add SAML federation currently spans over 60 files (and we're still missing several things) and is pretty complicated on its own. I would say that the feature you are suggesting should be the object of a new design document, after this PR is finally merged.

mmellison commented 2 years ago

Thank you for the insightful response!

This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

This is the behavior we're trying to replicate, where we have a single Kratos tenant/realm which powers external authentication for our entire SaaS platform. Our users are similar to that of the GitHub.com or GitLab.com, where despite the IdP, they are still part of our single user directory, on top of which they are given access to teams, projects etc. So ultimately I don't think we're looking to support true multi-tenancy, but just the ability to federate to multiple IdPs based on the user's email address domain.

I'm eagerly watching the existing PR and will be happy to have further discussions once that lands! :smile:

theogravity commented 2 years ago

Lack of SAML support is the reason why we would be unable to use Ory and have to use an alternative auth stack instead as our customer base wants to be able to use SSO + SAML.

It's really cool to see the community coming together to work on support for it.

timdrysdale commented 2 years ago

This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

This is the behavior we're trying to replicate, where we have a single Kratos tenant/realm which powers external authentication for our entire SaaS platform.

Delighted to see this SAML integration underway - well done to the team on getting the PR ready in recent days.

FWIW, in terms of multiple IdP, I have a similar use case to the above - acting as an SP to the UK University shibboleth federation. This will require interacting with a different IdP at each university, for which offloading the discovery process is probably the path of least resistance - at the cost of requiring a DiscoveryResponse endpoint. There is also seamless access as a possibility e.g. see thiss.io.

I suppose an alternative to supporting multiple SAML IdP is to have a single Ory Kratos instance with above PR that can do a single SAML IdP and use OAuth 2.0 via hydra for sharing resources between them all, but that adds a barrier to on-boarding a new institution/organisation and adds complexity around managing users across multiple Kratos instances, and seems out of step with the current one-SP-to-many-IdP model in use for SAML already.

Happy to discuss/maybe contribute within bandwidth limits if we do indeed go with Ory - just a team of one on the backend but would be good to be able to stick with a golang stack and get the other cool features in wider Ory like ACL etc.

github-actions[bot] commented 1 year ago

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️