nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.97k stars 1.41k forks source link

Client Auth API #369

Closed qrpike closed 7 years ago

qrpike commented 8 years ago

Nats seems perfect for our needs, however having auth hard coded on service start isn't very practical when we are adding and removing users while its running.

Implementing some go code to handle this is 1 option, another is to use an external service for authorization. Whether it's HTTP basic auth, etc. Being able to set an authentication endpoint would be very handy. Especially since we only allow a user to be logged in with 1 session.

If this is possible now please let me know, but I couldn't find it in the docs anywhere.

Thanks!

firebook commented 7 years ago

It's not possible. Our team found this issue too. And after read the source code, we found it is difficult to extend.

Hope nats will support it in the future.

derekcollison commented 7 years ago

So I do not assume, exactly what is trying to be achieved? If you force client auth through TLS today you essentially have it such that a CA needs to issue a valid cert for clients in order to connect to NATS. What do you wish in addition to this?

1N50MN14 commented 7 years ago

+1 I was just about to open a similar issue I've been trying to "hack" my way around this for some time.

@derekcollison The use case would be time limited, token based authentication for devices similar to what Azure IoT Hub offers (nm the fancy name for MQTT: https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-security, scroll down to Security tokens). It's also common with some of the "modern" brokers I've seen lately.

In my case I'm running my own device registry and need to use JWT based authentication with the Device UUID as payload to generate a time limited token which would be authenticated at the connection level to gnats, by making a call to an HTTP endpoint to accomplish that.

derekcollison commented 7 years ago

Currently my plan is to allow the configuration file to be updated and reloaded, similar to nginx, without downtime. We have been considering proposals similar to this but have not made a determination as of yet to whether or not we want to move forward.

1N50MN14 commented 7 years ago

@derekcollison Thanks for the feedback, yes I've seen the hot reload proposal. This would work in most cases but if you have a large number of devices connecting on demand a more dynamic approach is required imho. Imagine thousands of sensors connecting and disconnecting at any given time for any reason, which is the nature of many IoT projects even more so with devices that rely on 3G/4G connectivity. Maybe this is outside the scope of gnatsd, but I don't see why it should, if anything gnatds is perfect for the job.

There are currently 6 open authentication-related issues all of which, including topic based auth, I believe could potentially be solved by (optionally) supporting HTTP endpoints. I really believe gnatds should optionally allow users to decide on the authentication mechanism that works best for them should the nginx/hot-reload approach not suffice.

Anyway, my 50 cents only ;)

qrpike commented 7 years ago

Agreed it would be really nice to not have to hotreload constantly.

On December 9, 2016 at 2:39:30 PM, Ayman Mackouly (notifications@github.com) wrote:

@derekcollison https://github.com/derekcollison Thanks for the feedback, yes I've seen the hot reload proposal. This would work in most cases but if you have a large number of devices connecting on demand a more dynamic approach is required imho. Imagine thousands of sensors connecting and disconnecting at any given time for any reason, which is the nature of many IoT projects more so devices that rely on 3G/4G connections etc. Maybe this is outside the scope of gnatsd, but I don't see why it should.

There are currently 6 open authentication-related issues all of which, including topic based auth, I believe could potentially be solved by (optionally) supporting HTTP endpoints. I really believe gnatds should optionally allow users to decide on the authentication mechanism that works best for them should the nginx/hot-reload approach not suffice.

Anyway, my 50 cents only ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nats-io/gnatsd/issues/369#issuecomment-266103240, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo3HLjixvDaat-gP6szlcBkeAkPxxXrks5rGa5ygaJpZM4KpLYZ .

derekcollison commented 7 years ago

My belief is that in the sensor case, you would use an intermediate CA that is configured once in the nats-server, and generate client certificates as needed that are required to properly connect to the nats-server. We need to do some work on revocation for sure, but this is possible today.

For now, and by design, the HTTP endpoints are read-only. Adding "write" capabilities to them presents some long term challenges on being able to always understand the given state of the system, which for now, the configuration file is that source of truth.

qrpike commented 7 years ago

Is it not possible to take the approach docker registry does, where it uses a separate service to do auth? What way you can have several or any number of custom auth providers.

On December 9, 2016 at 2:53:49 PM, Derek Collison (notifications@github.com) wrote:

My belief is that in the sensor case, you would use an intermediate CA that is configured once in the nats-server, and generate client certificates as needed that are required to properly connect to the nats-server. We need to do some work on revocation for sure, but this is possible today.

For now, and by design, the HTTP endpoints are read-only. Adding "write" capabilities to them presents some long term challenges on being able to always understand the given state of the system, which for now, the configuration file is that source of truth.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nats-io/gnatsd/issues/369#issuecomment-266107045, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo3HE2R2EyVh1EWk98YPCz8qA7D5O5Jks5rGbHNgaJpZM4KpLYZ .

derekcollison commented 7 years ago

We could take many different approaches. I am committed to the configuration reload, still evaluating any write state changes via HTTP or a similar API endpoint.

1N50MN14 commented 7 years ago

@derekcollison The HTTP endpoint is merely an external service, it stops being gnatsd responsibility to handle writes. The only difference is that authentication tokens are checked against HTTP response codes (200 OK / 401 Unauthorized). This would allow for any approach, gnats simply outsources this responsibility.

qrpike commented 7 years ago

Precisely!! You did better explaining it lol

On December 9, 2016 at 3:22:45 PM, Ayman Mackouly (notifications@github.com) wrote:

@derekcollison https://github.com/derekcollison The HTTP endpoint is merely an external service, it stops being gnatsd responsibility to handle writes. The only difference is that authentication tokens are checked against an HTTP response code (200 OK / 401 Unauthorized). This would allow for any approach, gnats simply outsources this responsibility.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nats-io/gnatsd/issues/369#issuecomment-266113690, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo3HMZ-mba3uGiFBdVLr3LMTwptkBpTks5rGbiVgaJpZM4KpLYZ .

qrpike commented 7 years ago

@derekcollison Any updates on consideration for this approach? Would be VERY much appreciated.

1N50MN14 commented 7 years ago

https://vernemq.com/docs/plugindevelopment/webhookplugins.html <-- perfect reference (obviously this could be in conf file)

EDIT: More details https://github.com/erlio/vmq_webhooks

derekcollison commented 7 years ago

Have not had a chance to revisit yet. But I think I understand better what the thread is suggesting.

etopian commented 7 years ago

pretty much must-have for this sort of product. otherwise the number of use cases for this product are severely limited by this functionality not existing.

aeneasr commented 7 years ago

Please absolutely implement an API, config files are really tricky in cloud environments and actually against 12factor principles.

I also need to update auth* on the fly, as new users sign up.

derekcollison commented 7 years ago

Being able to update users, auth and permissions on the fly will come with config reload functionality similar to NGINX.

qrpike commented 7 years ago

@derekcollison So there will NOT be an option to use an external service as the authentication layer?

aeneasr commented 7 years ago

I don't think reloading config files is enough. After all, NATS is a message solution for cloud environments, and config files clearly contradict 12 factor principles, which are often required to get things working on PaaS such as heroku or cloud foundry or even kubernetes. It might work on a baremetal machine where I can start 10 services and tightly couple those things together, but its much harder in the situation just described. Additionally, it requires us to be able to parse and understand NATS config, which is making things much more complicated than they should be.

Please add API support.

etopian commented 7 years ago

Having to reload every time a new user signs up on your website is an incredibly poor user experience... and prone to failure. Nginx does not always reload. You have to reload it and pray that it comes back up. Mistakes can be made in the configuration files. A lot can go wrong. All in all an API is necessary.

derekcollison commented 7 years ago

@arekkas Config files and reloading them and having a single truth for config is well understood. I am not saying we would not consider an API, but I am not sure you have thought completely through the operational aspects of an API. We will keep researching, but we are committed to config changes and atomic, no downtime reloads via single, like NGINX.

derekcollison commented 7 years ago

@etopian I am pretty sure you can easily automate things like new users etc. Mistakes happen when humans are part of the process. Eliminating them is the solution, does not require an API per se.

derekcollison commented 7 years ago

@qrpike Not saying that, just still researching. After all, once anything is released, the team here is tasked with supporting it more or less for life. So prudence and thinking through these is important.

aeneasr commented 7 years ago

While we're at it, a middleware as described in #410 could potentially separate concerns and give everyone the solution they are looking for.

qrpike commented 7 years ago

@derekcollison configuration via flat files makes 100% complete sense, and I totally agree, just the user management combined with service configuration is more what I'm talking about.

Having a single user/auth service lets a micro service architecture have a single point of truth for user auth & permissions. If every micro service requires a user permissions config file to be written, on every update, there are many points where that could potentially fail, not to mention engineering a system to keep this all in sync.

As far as maintaining it, I would think a remote endpoint for user auth would lighten the workload for the NATS team. If you use HTTP basic auth, or even bearer tokens, It's based on well known standards and wouldn't change.

derekcollison commented 7 years ago

If user/auth or permissions is completely outsourced so to speak, that sounds like a reasonable alternative. Will address in early 2017. Thanks for the patience.

aeneasr commented 7 years ago

If user/auth or permissions is completely outsourced so to speak, that sounds like a reasonable alternative.

Awesome that you are considering this!

mahmed8003 commented 7 years ago

Previously I was using https://github.com/mcollina/aedes for messaging (pub/sub), nats attracted me because it have req/replay and it is simple. In Aedas for authorization and authentication we have methods like this(copied from documentation):

instance.authenticate(client, username, password, done(err, successful)) It will be called when a new client connects. Override to supply custom authentication logic.

instance.authorizePublish(client, packet, done(err)) It will be called when a client publishes a message. Override to supply custom authorization logic.

instance.authorizeSubscribe(client, pattern, done(err, pattern)) It will be called when a client subscribes to a topic. Override to supply custom authorization logic.

This could be another approach.

qrpike commented 7 years ago

@derekcollison Any updates or plans for this? Trying to plan out our systems development timeline. If I knew Go better I would love to help with this, but I don't think you want me touching your nice code lol

derekcollison commented 7 years ago

I think the general identity problem could be solved with a simple API/Plugin model. I am more concerned at this point on auth, which is performance critical. I am considering, but as work finalizes on config file updates and reload, I wonder if tooling around supporting that path makes more sense.

wenzheng commented 7 years ago

@qrpike We had the similar use case in our organization for this external authentication/authorization, and we managed to implement the feature based on our own requirement.

In general we use NATS broker to expose our internal data to different end user applications, so in general we implemented 2 things:

  1. Oauth token based authentication with our own Oauth server.
  2. Since we have a lot of end users, so we dont expect to hold the permissions inside the memory, so we delegate the permission check (instead of in memory check) to the Oauth server as well (by verify the consent).

Not sure if it is the same problem for your side

qrpike commented 7 years ago

@wenzheng Would you happen to have a fork of gnatsd, or something I could start with? That's exactly what I'm trying to accomplish.

Thanks,

qrpike commented 7 years ago

@derekcollison I have created a PR with changes which allow token based auth from clients to use a remote endpoint for authentication & authorization.

https://github.com/nats-io/gnatsd/pull/428

I'm no golang pro, but its working for us currently.

wenzheng commented 7 years ago

@qrpike, Sorry for late response due to Chinese Spring festival

I've posted a #429 for the topic and you can check our implementation might be helpful

derekcollison commented 7 years ago

I have asked @kozlovic and @petemiron to take a look.

kozlovic commented 7 years ago

@qrpike @wenzheng We will have a closer look, but since from the surface it seems that your PRs address the same problem, any one of you think that his PR is not needed/should be dropped in favor of the other one? It's fine if you both say no if you feel that they are complementary. I am just trying to reduce the amount of code review ;-)

qrpike commented 7 years ago

@kozlovic Mine is a very simple, single endpoint to authenticate and reply with permissions. I think https://github.com/nats-io/gnatsd/pull/429 is a little more in depth. 94lines vs 683lines. As for which is more in line with NATS' goals is up for you guys to decide, but I think it should be one or the other.

petemiron commented 7 years ago

@qrpike @wenzheng After reviewing both PRs and discussing with the internal team, I'd like to better elaborate the requirements for the feature. If you both also agree with this, there are some specific things I'd like to make sure we cover:

  1. pluggable auth filter architecture (similar to #429). However, I do not think we should offer regex based filtering. I think a simple priority-based filter would result in reduced risk. If necessary, we might want to consider an additional field in messages to indicate auth filter to prefer (eg. ouath vs. oauth2 vs. file-based).
  2. metrics from remote auth integration with monitoring endpoints (succeeded, failed, latency)
  3. error handling and failover of remote endpoints
  4. config handling definition (as opposed to overloading and extending existing config in #429). I'm concerned about possible new keyword/username collision.
  5. clarification of any caching and cache eviction of authn and authz

Once we agree on the requirements, we can do a more detailed review of the two specific PRs.

Do either of you have a strong opinion on whether or not we should edit the text of this ticket with the requirements or create a new, separate issue?

@kozlovic are there any additional items you'd add for requirements clarification?

aricart commented 7 years ago

IMHO - the plugin api should be generic and without preconceptions on how it handles any of this. Otherwise it won't work as a general solution. The plugin should decide to authorize/cache/evict. As for configuration really the only one that would be required is the type and likely the path of the DLL to load. Plugin configuration should be using its own file (which may be the only other configuration value).

kozlovic commented 7 years ago

@petemiron One that would add is that the design should minimize the performance impact for users not using the feature. It is one thing if the feature does not perform as well as one would hope, but it should not have adverse effect to server performance when the feature is not in use.

@aricart Plugin is not yet in the picture. We would have to wait for post Go 1.8 and even longer if we want to be able to support Go 1.7.

ColinSullivan1 commented 7 years ago

I agree with @aricart , I've had some past experience with authentication/authorization plugins; the simpler the interface the better by far. Moving responsibilities for caching, monitoring, eviction, etc into the "plugin" will keep the NATS server simpler. We can provide extensive examples, even utilities to help with this, but changes to the server should be absolutely minimal, IMO.

mcqueary commented 7 years ago

@petemiron I'd vote for a separate issue to cover the requirement definition, which would allow both conversations to be linked without constraining either one.

qrpike commented 7 years ago

I also agree it should be as simple as possible. Everyones use case will be different and we cant fit them all in the plugin. Just give the endpoint as much data as possible and let it handle it.

1N50MN14 commented 7 years ago

Agree too, there isn't much to it really, gnats has 3 core APIs connect, publishand subscribe each could be (optionally) configurable with a one liner http endpoint in the config file. That's it really, any logic, regex or not, should reside inside the endpoints. It's the user's responsibility, not gnats, to make sure whatever logic inside those endpoints is performant.

qrpike commented 7 years ago

I'm not sure about looking up permissions on every single publish. I am publishing 30-40k messages/sec, which isn't realistic when latency is a factor. I'm biased to my PR, but I prefer an endpoint call on connect which caches the returned permissions in memory. Even 10 thousand+ users would only be 10's of MB's of memory, and no permission logic code would need to be changed.

mcqueary commented 7 years ago

See also #428, #434

mcqueary commented 7 years ago

Closing this issue in favor if #434, which will be the master for this and related discussions.