nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.57k stars 1.39k forks source link

Enable external Authentication (authn) and Authorization (authz) via Extensible Auth Provider. #434

Closed petemiron closed 6 months ago

petemiron commented 7 years ago

Requirements

  1. An administrator must be able to configure an HTTP-based external auth provider. 1.1. The external auth provider must support TLS, including specifying certificate authority. 1.2. The external auth provider should accept and check credentials (username and secret) for the gnatsd server. 1.3. The external auth provider must have a configurable timeout for DNS, TCP connect, and response. These timeouts may be tracked in a single timeout or separated. 1.4. Metrics for requests, successful, failed and at least average response time of queries to external auth provider must be available through monitoring endpoints. 1.5. If no external auth provider is configured, the must be no additional impact on CONNECT performance.
  2. For an external auth, the gnatsd server must pass connect user credential information (username and password) to the external endpoint 2.1 the external auth provider must check authn and return a 200 with authz data (similar to example in #428):{ user: 'optional', permissions: { publish: ['foo.*'], subscribe: ['foo.*', 'bar.*'] } } 2.2. The credentials must be checked during client CONNECT. 2.3. The external auth provider may return a Time-to-Live (TTL) for authz data. 2.4. If a TTL is returned, the server should respect the TTL and re-request authn for the user on any new message sent to or received from that user after TTL expiration. 2.5. The external auth provider must provide a means for failover (eg. DNS round-robin, or multiple addresses in the configuration).

Plugin Interface Mockup

For discussion, here is a mockup of a plugin interface that passes a context around. This pushes locking responsibility into the plugin itself. It is not complete by any stretch of the imagination.

// Simple Mock up of a plugin.  In practice, uid will be some sort of
// principal struct, logger may be passed to the auth plugin, etc.

// AuthPlugin interface to for users to implement.
type AuthPlugin interface {
    Startup()
    Shutdown()

    // GetContext is invoked every time a new client connection is established
    // The plugin can choose to return a singleton, a context from a pool, or
    // a new context.  The plugin has the responsibility of locking accordingly.
    GetContext() interface{}

    CheckPublishPermissions(context interface{}, uid, subject string) bool
    CheckSubscribePermissions(context interface{}, uid, subject string) bool
    CheckConnectPermissions(context interface{}, uid, pass string) bool
}

// MyLittleAuthPlugin is a user plugin
type MyLittleAuthPlugin struct {
    // general stuff, configuration, plugin wide stuff.
    // this could optionally be the context as well, if the
    // context was a singleton.
    singleContext bool
    context       interface{}
}

// MyLittleAuthPluginContext is a context to be passed around to the
// APIs.  If the context is a singleton, the plugin itself can be used.
type MyLittleAuthPluginContext struct {
    Username string
    Password string
    Subject  string
}

func (mlap *MyLittleAuthPlugin) Startup() {
    fmt.Printf("Starting up.")
}

func (mlap *MyLittleAuthPlugin) Shutdown() {
    fmt.Printf("Starting up.")
}

// invoked every time a new client connection is established
func (mlap *MyLittleAuthPlugin) GetContext() interface{} {
    if mlap.singleContext {
        return mlap.context
    }

    return &MyLittleAuthPluginContext{
        Username: "colin",
        Password: "password",
        Subject:  "foo",
    }

}

func (mlap *MyLittleAuthPlugin) CheckPublishPermissions(context interface{}, uid, subject string) bool {
    ma := context.(*MyLittleAuthPluginContext)
    return strings.Compare(ma.Subject, subject) == 0
}
func (mlap *MyLittleAuthPlugin) CheckSubscribePermissions(context interface{}, uid, subject string) bool {
    ma := context.(*MyLittleAuthPluginContext)
    return strings.Compare(ma.Subject, subject) == 0
}

func (mlap *MyLittleAuthPlugin) CheckConnectPermissions(context interface{}, uid, pass string) bool {
    ma := context.(*MyLittleAuthPluginContext)
    if strings.Compare(ma.Username, uid) == 0 && strings.Compare(ma.Password, pass) == 0 {
        return true
    }
    return false
}

func TestAuthPluginAsServer(t *testing.T) {
    // Server's responsibilities
    var plugin AuthPlugin

    // On startup server sets the plugin
    plugin = &MyLittleAuthPlugin{}

    plugin.Startup()

    // At connection time, get the user context.
    uc := plugin.GetContext()
    // context is stored with the connection, and passed to relevant APIs
    if !plugin.CheckConnectPermissions(uc, "colin", "password") {
        t.Fatalf("credential check failed")
    }

    if plugin.CheckConnectPermissions(uc, "colin", "garbage") {
        t.Fatalf("credential check failed")
    }

    if !plugin.CheckPublishPermissions(uc, "colin", "foo") {
        t.Fatalf("publish check failed")
    }

    if plugin.CheckSubscribePermissions(uc, "colin", "bar") {
        t.Fatalf("subscribe check failed")
    }

    plugin.Shutdown()
}

Related Issues

428

429

369

wenzheng commented 7 years ago

Hi @petemiron, do you have any plan to implement this requirement?

petemiron commented 7 years ago

Hi @wenzheng, do you have feedback on this? Many of the ideas stem from your Pull Request, but we've tried to balance performance with the flexibility in your suggestion in these requirements. If you agree and would like to modify your PR to suit these requirements, we'd happily review it. We do think this is a great idea, but our core team doesn't have the bandwidth to implement at this point. We just wanted to make sure to capture the suggestions as a set of requirements.

wenzheng commented 7 years ago

Hi @petemiron

I see our colleague @firebook had commented in the previous PR#429, but yes I think it would be possible for us to modify the PR to suit the requirements, I will talk to our team and see when can we make this happen

x6j8x commented 7 years ago

👍 for this... This would enable us to provide NATS as a brokered service to our apps running on Cloudfoundry. In fact I could live with a very minimal implementation as in #428.

valichek commented 7 years ago

Hi, We would like to know what is the current status of this issue? We are going to use nats to connect >50k devices from "outer space" and not happy with updating/reloading config files.

@petemiron @derekcollison It's clear that the performance and reliability are top priorities when reviewing external auth feature. But what about having user auth service internally connected to nats with system subscriptions. Have you discussed this possibility already? Are there any stop factors to implement it? The idea is to have have nats client(s) to serve as auth provider. I see it now like having additional subscriptions and/or maybe protocol message, so auth provider(s) is able to announce when ready and be registered by gnatsd.

derekcollison commented 7 years ago

I believe reloading configuration files, a WIP, will solve the majority of needs here.

x6j8x commented 7 years ago

@derekcollison At least for our use-case (automatic provisioning of nats subject subtrees to apps on Cloudfoundry & Kubernetes via a service-broker) it feels awkward and error prone to generate nats configs on multiple nodes and then to rely on config reloads.

derekcollison commented 7 years ago

I would imagine that the process would be automated, where config files are properly generated, updated, distributed securely, and the server's signalled properly to reload the configuration.

x6j8x commented 7 years ago

It's certainly doable, but something like #428 would be so much simpler and less error prone (and without timing issues - the state of the authn/authz can always authoritatively be answered by the external entity and is not "in flux" during the regeneration/reload of the config).

derekcollison commented 7 years ago

I think the idea has merit, however it is not complete. For instance, when a user is removed, or permissions updated, these cases are covered by configuration reload, but are currently not accounted for in #428. The synchronization issues are the same in both cases IMO as well as error handling and exceptions.

valichek commented 7 years ago

Removed user or updated permission could be tracked with auth TTL (should have some configured period) if not available/changed - close connection to force client to re-connect. If one don't want TTL defined because of additional traffic, there should be a possibility to receive the message from auth provider and kill connection. It could be webhook or subscription if auth provider is nats client

eljefedelrodeodeljefe commented 6 years ago

Can you clarify on the timeline? I can only chime in, that this is much sought after.

VladimirAkopyan commented 6 years ago

something like this would be great, and I imagine quite simple: https://github.com/rabbitmq/rabbitmq-auth-backend-http

vtolstov commented 5 years ago

any progress?

derekcollison commented 5 years ago

We are moving forward with this through our Nkey and JWT work. Will keep everyone posted as best we can. Look for something before end of year.

vtolstov commented 5 years ago

may be you have some testing code in branch? i'm realy want to check this

derekcollison commented 5 years ago

Here is where we are right now. We have nkey support and account isolation and sharing for a single server. Next up is adding account support to clusters, that begins this week. My target is to be done by end of week. After that is non0-server defined configuration, which will involve work that affects this issue.

h4xnoodle commented 5 years ago

FWIW we do the following in BOSH https://github.com/bosh-dep-forks/gnatsd/blob/bosh-1.3.0/auth/certificate_auth.go. We use the cert itself for auth as we need to solve a catch-22 for auth. It works well for us.

We would move to the new pluggable interface when it's completed, thanks for your work on it.

rusenask commented 5 years ago

Hello, I have been following this issue a year ago but at that time I managed to avoid requiring authorization in the gnatsd. I know this can sound annoying and I apologize for that, but when could we expect this in the master branch? Not sure whether to start working on some wrapper for auth or wait for this (ideally I would want to use this implementation).

derekcollison commented 5 years ago

You can do custom authentication now. We do not have a call out option yet. We have added nkeys and are adding decentralized management of JWTs based on nkeys. Will be adding pulling user from x509 cert as well. Which specific problem are you looking to solve?

rusenask commented 5 years ago

I need per user control to specify which subjects it can subscribe and publish to.

derekcollison commented 5 years ago

That already exists today.

rusenask commented 5 years ago

Sorry, I forgot to mention that user management is dynamic, they can come and go. I am aware of the file-based configuration (https://www.nats.io/documentation/server/gnatsd-authorization/). Is there an API available (embedded version would work as well as my backend is written in Go and I can start the server internally) on the server which I could use to add/remove users?

derekcollison commented 5 years ago

Not yet, but one may show up soon. Also note that changes to a config file can be reloaded without server restart with gnatsd -sl reload

damouse commented 5 years ago

Maybe its just me, but i read all of rusenask's comments in the context of external authentication and authorization, but the replies didn't seem to directly address that.

I've been following this issue for a while hoping I could transition to nats once there's a solution for fine-grained, dynamic run-time authorization. Is there no concrete plan to work towards this functionality, either via call-out or some internal system?

As I understand it, in order for me to make gnatsd -sl reload for this, I'd need a system to manage client permissions with my own semantic structure, generate a format consumable by gnatsd, distribute that file to members of the cluster, then run reload on them. Is there an official way of doing this?

derekcollison commented 5 years ago

Currently there is not a way to do fine grained permissions by having the server do a callout. We will be releasing code that makes the current process work in a kubernetes environment by using the internal methodology of gnatsd as it is today. That will just be abstracted out and you will see a REST gateway to add/remove/change users and their permissions.

With Nkeys and JWTs, these are managed outside of the server and do not require server configuration changes or restarts to add or remove users or update their permissions.

rusenask commented 5 years ago

is it going to be a sidecar that keeps the file synced? :)

derekcollison commented 5 years ago

For kubernetes yes most likely. Will automatically handle updating the config and synching across pods via secrets and having servers reload the file automatically.

rusenask commented 5 years ago

ah, makes sense. I initially thought writing a similar sidecar but it would subscribe directly to gnatsd channel for updates and the gateway service would be publishing to that special control channel. Would sidecars in your implementation create a watcher from k8s client go for the secrets or would it get updates through some other mechanism?

derekcollison commented 5 years ago

Adding in @wallyqs who can shed more light on details.

wallyqs commented 5 years ago

@rusenask yes it could be done with a sidecar to update a shared secret with the NATS server and then trigger the reload in the servers. There is also some support for dynamic users with Kubernetes Service Accounts using a similar approach in the nats-operator using the service account bound token alpha feature.

rusenask commented 5 years ago

In my use I would want to have thousands of dynamic users/tokens with specified permissions. Ideally I wouldn't want to rely on any Kubernetes features even though I am running inside it :/ Would your proposed approach still work for such use case?

I have checked that existing PR with the remote auth and from the thread it seems like CustomClientAuthentication might be the best option for me. I have already got code for token creation/authentication in my backend that I would have used anyway. I guess time to do some experiments : )

derekcollison commented 5 years ago

I think CustomClientAuthentication may be the way to go for your use case.

derekcollison commented 4 years ago

I wanted to check in with this group since it has been awhile since we have launched NATS 2.0 with decentralized auth via JWTs etc. We also have account isolation for true multi-tenancy but we also use them for system accounts to let nats-servers talk amongst them selves, provide analytics, stream events etc.

We have been chatting about authorization again in the context of NATS 2.0 and things we did right and use cases where we still could improve. Any input appreciated here, and happy to jump on a call if needed.

colek42 commented 4 years ago

@derekcollison We would like to use OPA to manage policy. There is currently no way to interface with an external policy engine. I can hop on a call and detail our use case if it helps.

derekcollison commented 4 years ago

@ripienaar has been looking into that so will let him chime in here.

ripienaar commented 4 years ago

Reading through the comments here I think the NATS 2.0 work does indeed address the bulk of concerns here, if not I'd be keen to hear what areas need attention.

The remaining is about different ways to express the authorization rules and OPA is a option, I like OPA and while internally we've only bounced this around over drinks so to speak we thought it might be worth either embedding OPA policies into the JWT - so the account owner has an option to be much more flexible than the simple allow/deny rules - but of course also the more traditional OPA agent that can run next to the NATS Server.

@colek42 I'd be interested in hearing more about how in your mind the OPA integration would look, it's something I've been keen to plumb in my self

colek42 commented 4 years ago

@ripienaar OPA has a great blog post about doing this with Kafka, https://www.openpolicyagent.org/docs/latest/kafka-authorization/

We are using SPIFFE for node and workload attestation. (I have another issue to use the SVID as identity). We are using OPA/Envoy to manage policy across our restful services, we would like to use the same tool for our message based systems.

This issue is related for our use case. ref: https://github.com/nats-io/nats-server/issues/1325

ripienaar commented 4 years ago

OK, thanks

ripienaar commented 4 years ago

The problem with the OPA / Kafka model is that they call OPA on every operation, benchmarks are hard to come by but here is another OPA plugin for Kafka that do have some benchmarks.....and wow, 170 operations/second with latencies measured in milliseconds? That includes a caching layer of looks like 3600 seconds. This wont work for NATS.

Kafka does very different thigns from NATS so thats not a NATS v Kafka thing I am not comparing the tools - it's just a rate of request thing. We can't call a 3rd party service on every message.

Earlier in this thread there is a suggestion that we have something we call at login time, that'd be fine but that's not the OPA model - OPA is generally used to return boolean authorization outcomes.

OPA packages used inside a go app without calling the daemon is quite fast and we could speed things up a lot with some internal caching too but I don't think we will realistically take on OPA as a compiled in dependency

derekcollison commented 4 years ago

Fast path per message checks to OPA make no sense. If we could query OPA and then have it push any changes in realtime to the servers, maybe even via NATS, that may be something to look into with their team.

ripienaar commented 4 years ago

That’s just not really how OPA works. It’s a policy as code that is designed to return Boolean to allow it or not.

So it can’t send us updates - it has to evaluate each time - in the Kafka examples they cache decisions for a hour but even that is meh since often people do time based policies.

derekcollison commented 4 years ago

Yes I know, but at Apcera we had a push based policy system. You essentially registered for interest in a certain policy (think of this as a way to encapsulate into a subject) and changes would be pushed in realtime. Was very scalable. We could potentially work with OPA on something like this. We discussed this with them at the time as well.

ripienaar commented 4 years ago

OPA can return documents so we could use it for this but would be a bit of a massive downgrade in what people typically do :(

Anyway will POC something up and see

derekcollison commented 4 years ago

Lets you and I brainstorm a bit and then come up with a plan. IMO low priority but could be interesting. Then we could schedule another call with OPA team.

danmx commented 3 years ago

any news about it?

derekcollison commented 3 years ago

Still on our list but not super high priority within the ecosystem and our user/customer base.

We will get to it, but if its super important and critical we can try to discuss how we may be able to prioritize over other items.

bbdb68 commented 3 years ago

Hello, I am new to NATS and find it very appealing for my application (real time 3D collaborative app over the web).

The only missing (but critical) feature is customizable authentication. As an example we have users that require Oauth2 authentication (at logging time only) and I do not see how to implement it within nats. (given than users may require a specific way to validate the oauth token, so it requires some kind of server-side scripting/component to implement it).

For now I use wamp/crossbar which is really fine, but its static configuration of realm (equivalent of nats account) is going to be an issue.

valichek commented 3 years ago

@bbdb68 I have solved this problem with custom written NATS relay that was parsing protocol and checking tokens in payload

derekcollison commented 3 years ago

You can customize permissions with NATS in operator mode using claim JWTs and private nkeys (or designate the JWT to be a bearer token).

Do you own the Oauth2 domain?