orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.72k stars 869 forks source link

OrientDB authorization model can not be used as application level authorization #2229

Closed enisher closed 9 years ago

enisher commented 10 years ago

Such features as record level security encourage user to use OrientDB authorization as application level authorization. In other words, use OUser class as a main class for application user.

However there are some significant design issues in binary network protocol.

I suppose users would like to use approach similar to following

ODatabaseDocumentTx is not thread safe. And in binary network protocol each client session holds its instance of ODatabaseDocumentTx.

Thus session can not be used by several connections from different sockets because they can be handled in different server threads.

As a result we can not reuse session in different client threads.

This means that we have to open a new session for each user * each application thread. That leads to a need to reauthorize user in db for each thread. And as soon as user request can be handled by several application thread we have to store username and password at application layer.

Memory consumption

Having a thousands of active users may lead to thousands of opened connections in client connection manager. This could be memory consuming.

Possible memory leaks

If user is not logged out and socket was reused by other user, its session may remain in client connection manager forever. It won't be closed automatically as soon as socket still alive and used by application to handle another user sessions.

Session Id shouldn't be used as auth token

Session Id is serial, so it is not really safe to use it as a auth token.

No native way to make users rememberable

As soon as OrientDB supports only user/password authorization, "Remember me" feature could be implemented only by workarounds.

Automatic shutdown of session

Automatic shutdown of sessions that associated to broken connections are not designed for sessions that could be used by several connections.

So some session that is closed in such way may still be used by some other thread.

phpnode commented 10 years ago

just to reiterate how severe this is, it makes it impossible to use orient's permission model efficiently for most websites. It renders OUser, ORole and all of the class and record level security features inaccessible.

ruckc commented 10 years ago

How about tackling this from a different method. I think this is actually two issues, the first is connection pooling and the second is session handling across the pool.

To address connection pooling the application could have its own account (typical setup) and allow it to "switch user" to the actual user's account. This could be implemented as an additional privilege with a role restriction. The application in this case would be responsible for validating credentials initially and it could be given a ticket for future requests. The tickets could be infinite lifespan (tied into their session record), max lifespan, or have a max inactive lifespan. This approach could also be used to allow external authentication (to Active Directory or some external service) and OrientDB would have to "trust" the application in its switch user capacity. PostgreSQL supports a portion of this workflow through SET ROLE.

To handle the session state issue OrientDB could just store session data in a cluster (either memory or disk based) with each user having one record (and record level permissions in-place). A plugin could be written to handle purging old session.

phpnode commented 10 years ago

@ruckc I agree that this is two issues, but I don't really like your proposed solution - why do we need a special role and "user switching" when global sessions, ala HTTP, could easily solve the problem without having to introduce a greater degree of trust for the client app?

I think it's brittle and a lot of overhead if the application has to do a SET ROLE = foo before each subsequent command - we're trusting the wrong thing

I agree we could create a new OSession class as you propose, treat it like any other cluster but old records are removed when their TTL expires. Developers can then use OSession to store temporary data just for that particular user.

phpnode commented 10 years ago

Copied / pasted from another issue:

currently Orient has a 1 socket = 1 thread structure, but this means that if clients want concurrency, they must implement their own connection pool (and you can't reuse session ids over multiple sockets because it's not thread safe). Wouldn't a thread pool help with this? Instead of having 1 socket = 1 thread, when a socket receives a request it should take a thread from the thread pool, do its work, then return it to the pool?

ruckc commented 10 years ago

So, the solutions to this type of problem across other databases/software:

1) PostgreSQL, use set role. Since users are roles in PostgreSQL, could work with the application having its own credentials, and would require validating the user credentials at the application, and trusting the application to SET ROLE appropriately. 2) Unix, use su, requires authentication credentials of user switching to. 3) SQL Server's whitepaper on the concepts even make assumptions that by doing RLS you are not using connection pools due to the requirements driving row level security. 4) Oracle implements Proxy Authentication which is similar in concept to PostgreSQL's SET ROLE. It supports connection pooling by associating the proxy'd account in connection metadata when fetching/creating a new connection.

I think the easiest/lightest method is using Unix's switch user (su) model to allow an application to elevate existing open'd databases to a user's granted role. Essentially extending OUser to OApplication and OIndividual each having credentials required. Also the client API should provide a connection pool that can automatically exit and persist the user's OSession when a connection returns to the pool, but the connection/database should stay open, just with the OApplication as db.getUser() until the application requests another connection for a user. Additionally, the OApplication would have no actual CRUD rights on the database other than a permission to allow switching user. This could also be extended with another permission (i.e. connect) to only allow OApplication's to create connections.

One hinderance to the connection pool above would be implementing scrypt/bcrypt hashing due to the time requirements to validate the credentials over and over. This could potentially be mitigated by having a in-memory cache of sha256 hashes of recently validated passwords.

Something to keep the model flexible enough would be overriding openDatabase(String user,String pass) to openDatabase(Credentials creds), which would support user/pass authentication along with PKI or any other two factor authentication.

phpnode commented 10 years ago

@ruckc the point of OSession is to avoid the need to reauthenticate again and again, the OSession id is a long, secret key that is enough, by itself, to provide authentication. So no need to check the password on each request.

The binary protocol already has a thing called a sessionID, required for every request, so there's no need for the application to have its own account, and there's no need to deal with user switching (which implies that the client application can keep state). All that the application needs to do is forward the correct sessionID for that user - this is how it already works. The problem is that the existing sessionIDs are sequential (easy to guess), non-global and non-reusable. If we can change that we don't need to really change anything else about the security model i think.

lvca commented 10 years ago

@phpnode so generating a random long could be good enough for you? What else is missing?

enisher commented 10 years ago

@lvca, @phpnode I suppose concurrent request processing in a single session are also important for such case

phpnode commented 10 years ago

@enisher exactly, that's the only remaining issue I think

phpnode commented 10 years ago

@lvca @enisher on second thoughts, there's a remaining problem - AFAIK sessionIds are not shared between orientdb servers, which makes working with server clusters awkward. This is another reason to go with a new OSession class I think, because it could leverage Orient's existing replication features, and give an obvious method for manually expiring active sessions - just delete the record.

giastfader commented 10 years ago

Hi guys, If I am still on time, I would to give my contribute on this. As far the sessionId type is concerned I think that it shouldn't be a long. I think that a UUID can fit better for unique-identifier than a long and can be more difficult to guess by an attacker. But I suppose that in this case the impact on the protocol will be significant. I also think that there should be a way to automatically invalidate all the session ID related to a specific user if he/she changes the password. This is a typical scenario when data are accessed by multiple devices.

phpnode commented 10 years ago

@giastfader totally agree, in fact I meant that the session key itself should be long (i.e. unguessable), not that it should be a Long, sorry for the confusion!

giastfader commented 10 years ago

@phpnode :) Maybe the @lvca 's reply led me astray.

lvca commented 10 years ago

@phpnode and @giastfader Thanks for your suggestions. I don't know if writing the record to the database could be a good idea because we've Hazelcast under the hood and creating a clustered Map with sessions it's straightforward. The problem could be: how many sessions could we have? thousands or millions?

phpnode commented 10 years ago

@lvca potentially millions for the largest of sites. Without the record approach, how would we expire an existing session? how would we find the active sessions for a given user etc? Presumably via new SQL commands?

giastfader commented 10 years ago

@lvca however I think that you should pay attention if you try to solve a problem that maybe, and I mean maybe, should not be solved by the database engine. Trying to solve this problem could lead you in scalability issues, and sessions (and stateful machine in general) IMHO cause the biggest concerns in scalable systems. What do you think about encrypted session tokens? Let me explain:

PRO: the db engine does not need to store anything, clients don't need to store username/password, just the tokens CONS: decrypt the tokens could be CPU intensive

WDYT?

lvca commented 10 years ago

By using Hazelcast it's vert easy: we have such Map to manage and all the nodes see the same map. So all the lookups works in distributed fashion. Current background task that purges all the expired sessions would work in the same way. WDYT?

phpnode commented 10 years ago

@lvca sounds reasonable I think, but I'd imagine it won't be possible to query, e.g. there'll be no way to do the equivalent of this:

SELECT COUNT(DISTINCT(user.name)) FROM OSession;

to retrieve the number of currently online users.

lvca commented 10 years ago

@phpnode No, but we could create a "memory" cluster (non-persistent) and in case of distributed architecture the cluster would be replicated among servers.

phpnode commented 10 years ago

@lvca this would be ideal i think!

phpnode commented 10 years ago

@lvca @enisher any updates on this? It's really blocking us :frowning:

tglman commented 10 years ago

Hi, so speaking with the @phpnode the blocking issue is just the "migration" of the session cross connection/server, so we thought to two possible easy solutions:

the first one is a random generated uuid, to use a key of a distributed map, where in the value we have the detail of the session, this uuid will replace the current sessionId, but will be used also for new connections. advantages:

disadvantages:

The second solution can be generate a token as described by @giastfader advantages:

disadvantages:

all the other features, like the possibility to query the active session, are more a nice to have and not blocking right now ;)

shall we choose one of the two ?!?!
I actually prefer slow and scalable (token)

enisher commented 10 years ago

Ok, so we don't need concurrent request processing in the same session for now, that is much easier.

The second proposed solution seems more sophisticated as for me.

phpnode commented 10 years ago

Definitely think that the first option is much easier and faster. All we'd need is to store a map of UUIDs to user ids. Since we can reliably assume that user ids fit in integers, this can be as little as 20 bytes per user (plus a little more if we want to do TTLs).

Doing this doesn't really make the server any more stateful than it is at the moment - the server already keeps a map of sessionIds to userIds, we'd just be making that map shared rather than being per connection.

If we go with the second option, I don't think it's possible for the server to expire a key, so we'd have no way of revoking session ids for users that have been deactivated etc.

mattaylor commented 10 years ago

is this a 2.0 thing now?

young-druid commented 9 years ago

If a decision to use OUser as an authorization entity is made it is important to add more security into its model. I suggest not just encrypting password with SHA-256 but also adding a 'salt' field into OUser class and using it during encryption. What do you think?

phpnode commented 9 years ago

@young-druid there's another issue for that - #2242

young-druid commented 9 years ago

Oh, right. Thanks for that. I put my comment with a link about hashing in Java there.

phpnode commented 9 years ago

@lvca can this be part of 2.0? IMHO it's really essential and blocking one of Orient's most useful features.

lvca commented 9 years ago

So we'd need the following things to share client connection between threads:

To improve security we should:

To share the connections with all the server nodes, we should:

emrul commented 9 years ago

I've been looking at this particular problem and how to tackle it within OrientDB for a while. I'll contribute my ideas here.

Conceptually, I prefer a stateless 'token' approach described by @giastfader and I think @tglman preferred it too. It means that session state will never be a limiting factor to the number of concurrent users serviced by the database.

With respect to how the token is constructed, I think the team should look at JSON Web Token (JWT): IETF draft here: http://tools.ietf.org/html/draft-ietf-oauth-json-web-token and explained nicely here: http://jwt.io . JWT is basically a signed JSON object that is Base64 encoded. The token does not need to be distributed via Hazelcast.

The user can request a token for a database using username/password authentication. The token issued by OrientDb can include additional properties to make subsequent requests performant (e.g. the token can include the database name, user roles, anything else the Orient team require). OrientDb can maintain a map of random signing keys in a distributed Hazelcast map for added security (this is optional but can improve security).

If using OrientDb REST then the user can pass the token as an OAuth2 Bearer token (using HTTP Basic authentication with the password value set to 'Bearer: ' + token. OrientDb needs to only validate the signature.

If using OrientDb Java client API then I think the DbPool map can be scoped only to the server address (currently the pool map key consists of (URL[inc. dbName] + username + password). The JWT token should be included on each request. The reason I suggest this is from experience in a large multi-tenant clustered web application environment that was stateless. If you have a 'database per tenant' architecture then the current connection pool is not very useful. I'm making the assumption that each TCP connection can be multiplexed so as to serve queries for multiple databases without negative impact on the OrientDb server.

In theory this requires minimal code changes and provides a lot of scalability. In addition, JWT/Bearer tokens are already understood by many developers already familiar with RESTful services. JWT tokens are fairly simple to debug too.

I welcome any feedback on this idea.

lvca commented 9 years ago

@emrul Thanks for this contribution. I found this class https://bitbucket.org/lluisfaja/javajwt/src/ba8ac3c312a6c6b5d69d6436cd21f88cd0a126aa/JavaJWT/src/com/unblau/javajwt/JWT.java?at=master where we could derive our impl. But the algorithm is super easy.

emrul commented 9 years ago

@lvca yep, that looks like it will do the job. There are a few others too: https://github.com/auth0/java-jwt (from the maintainer of jwt.io) and http://connect2id.com/products/nimbus-jose-jwt (supports more features and Apache 2.0 licensed).

As you say, the algorithm is indeed super easy and that is why I like it :)

rajohn96 commented 9 years ago

OK, so I like the idea of the token being generated at initial connect; how does a given token get INVALIDATED, say at a TTL or when a user password is changed or, most importantly, when an administrative action (such as account lockout) is taken? How does those tokens "in the wild" get addressed? A sort of CRL in PKI speak is needed, is it not? I don't see how this is address in the ietf document, but prolly because i'm dense :-)

rajohn96 commented 9 years ago

I also think as a complement to this a means to externally manage user account information (such as LDAP) is in order, as there are likely many places (Other than within the DB) that this information is critical. Is there already an issue re establishing the concept of an "enterprise user" (Oracle's term for this same thing) being tracked?

emrul commented 9 years ago

@rajohn96 There are a number of approaches to address this (none are explicitly spelled out in the IETF document because JWT is a data format and not a protocol). The most obvious way is to enforce the expiry ('exp' field) date of a token. CRLs might be overkill (even most enterprise SSO solutions state that tokens are valid until they expire and there's no mechanism to revoke them so best practice is to set a reasonable expiry period). If you need to force a mass-recovcation of all tokens this too is easy - change the secret that is used to sign tokens and all tokens will instantly become invalid.

With respect to external authentication I am very glad you raised this. If OrientDb consumes JWT tokens and treats them as OAuth2 bearer tokens then that opens the door to having those tokens generated via external authentication mechanisms (e.g. an authentication server hooked up to LDAP that generates a JWT token on successful authentication). I'm working on a project where this approach is being taken (using CloudFoundry's OAuth2 server: http://github.com/cloudfoundry/uaa and incorporating SAML as described here: http://www.codeproject.com/Articles/598581/How-to-integrate-Spring-oAuth-with-Spring-SAML).

So essentially, if user account information needed to be externalised then this approach would support that.

rajohn96 commented 9 years ago

yes, but the key (Pun Intended!) for the externally generated/managed token is for Orient to acknowledge it and behave appropriately. I know that goes without saying, but ...

Yes, there are a number of ways that discrete tokens could be managed; given that the T in JWT is Token it does seem to be within the realm of that specification, but that's just me. Yes, a sledgehammer approach would be to change the signing secret, but something with a lot less collateral damage is what I have in mind; I don't think expiry is it, but maybe a versioning element within the token is, something that could indicate the staleness of the token maybe?

here's some good insight into how Oracle is doing the same thing; yes, I am a former Oracle employee :-)

http://docs.oracle.com/cd/B19306_01/network.102/b14266/apdvcntx.htm

emrul commented 9 years ago

I'm a former Oracle worker too but I certainly would not recommend we follow them down this path ;) Can you quantify what is wrong with using an expiry?

Along the lines of what your suggesting one approach might be to store the Orient user record version in the token. The User record has to be loaded when connecting to the database so at that point there could be a check to see the user version and the token match. However, I think there may be certain cons to this approach that would make it difficult in a web-app centric world.

rajohn96 commented 9 years ago

Don't know what specifically you are concerned about WRT the Oracle model of identity management and label security, to include external user management that multiple sources of information can draw up (vs replicating it inside each). I have no issues with using an expiry, its just that it has to be designed in such a manner that it is discrete and time independent (such as that driven by an administrative action like account lockout) is possible, and not (IMO) a by-product of the natural expiry that is a part of the spec (which, as an optional element is also problematic).

emrul commented 9 years ago

I have nothing against Oracle identity management - I meant that I didn't think building identify management stack into a DBMS is the right approach. I agree, supporting external sources of information is beneficial and that's what I hope can be achieved.

I understand your point about being time-independent and I think incorporating the user's record version accommodates this. The spec makes expiry an optional claim but if it is implemented it must be checked. I would encourage for any token authentication implementation to include and check expiry.

rajohn96 commented 9 years ago

agreed; there then are two issues here, one being the use of token based authentication and the other being support to externally managed accounts.

emrul commented 9 years ago

@lvca I've put together a working implementation for token based authentication in orientdb. It isn't perfect and requires some work to ensure proper security. However, I wanted to share with you so you could, if you wanted, test out whether this type of mechanism is appropriate for OrientDb.

All my changes are in my local repo: https://github.com/emrul/orientdb/commit/c15c979331767e897ecc489d3c390a61766c7b51

Obtaining a token

There is a new endpoint /token/ that will return an authentication token. In the example below there is a database named 'TestTokenAuth' with user 'emrul@emrul.com' and password 'password': The request parameters conform to the OAuth2 specification

curl --data "grant_type=password&username=emrul@emrul.com&password=password" http://localhost:2480/token/TestTokenAuth

This returns a JSON structure including the access_token and expires_in information. This is consistent with the OAuth2 specification but not complete:

{"@type":"d","@version":0,"access_token":"eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyUmlkIjoiIzU6MyIsImV4cCI6MTQxMDgxOTE5NiwiZGF0YWJhc2VOYW1lIjoiVGVzdFRva2VuQXV0aCIsInN1YiI6ImVtcnVsQGVtcnVsLmNvbSIsImF1ZCI6Ik9yaWVudERiIiwiaXNzIjoiT3JpZW50RGIiLCJqdGkiOiI4N2YxZjIzNy1jMjhmLTRjMjctOWM0Yy05MDQ1MjgyMjgwYTEiLCJpYXQiOjE0MTA4MTkxODZ9.GD2r5Hf1hXSE_0R4BOMjfeZ8y_kBS2ysZvngAPTjjN8","expires_in":10000}

You can copy & paste the access token over at jwt.io, it looks like this:

{
  "userRid": "#5:3",
  "exp": 1410819196,
  "databaseName": "TestTokenAuth",
  "sub": "emrul@emrul.com",
  "aud": "OrientDb",
  "iss": "OrientDb",
  "jti": "87f1f237-c28f-4c27-9c4c-9045282280a1",
  "iat": 1410819186
}

The access token isn't as I would prefer for a full implementation but contains the minimum needed to demonstrate the functionality.

Using a token for authentication

You pass the token as an bearer token in the Authorisation header. For example:

curl --header "Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyUmlkIjoiIzU6MyIsImV4cCI6MTQxMDgxNzUxOCwiZGF0YWJhc2VOYW1lIjoiVGVzdFRva2VuQXV0aCIsInN1YiI6ImVtcnVsQGVtcnVsLmNvbSIsImF1ZCI6Ik9yaWVudERiIiwiaXNzIjoiT3JpZW50RGIiLCJqdGkiOiI2YWIzOTJlNy04ZWEzLTRmZmItYjhmMS1lZTlmMTVmYjY5ZDkiLCJpYXQiOjE0MTA4MTc1MDh9.LAT7MHlhihMkKwC_9dM1r3HFAfkj4BPngXzIEVxTlUg" "http://localhost:2480/query/TestTokenAuth/sql/select%20*%20from%20AClass"

Current implementation notes

Summary

Any thoughts are welcome.

lvca commented 9 years ago

@emrul You're awesome! I like this approach and maybe we could release it for 2.0! Tomorrow (if there are not urgent issues) we'll release 2.0M1, but final 2.0 will be released on Sept 29th. I know you already did a lot, but wdyt to continue this implementation with the missing things you described like:

phpnode commented 9 years ago

this looks very cool, but how will it work with the binary protocol (which currently uses integer sessionIds) and can this work over multiple sockets?

emrul commented 9 years ago

@lvca I can spare some time before the 29th but I don't think I will get it done in time. I will give it a shot though.

@phpnode Messing around with the binary protocol is something that's beyond my capability but my idea is that the token should be returned as part of an open() call on the client (or if the client is trusted, the token can be generated there if it knows the signing key). The token should be included in every request thereafter. The server should verify the token on every request. Then you would only open socket connections to the OrientDb host and each connection could service requests for different databases and users because the token would identify the database name and user name (so yes to the question you asked). It would not need to be Base64 encoded over the binary protocol but I don't know how such a change can be made in the binary protocol and probably it is something @lvca's team would need to do.

phpnode commented 9 years ago

@emrul seems like sending it alongside every command would be possible but quite some overhead, more CPU, more bandwidth etc. Sending the embedded UUID would be nicest but I guess that defeats the purpose of the token

emrul commented 9 years ago

@phpnode yes, the downside of JWT is network overhead. CPU overhead isn't that bad - SHA hash verification is quite fast on modern servers and is entirely computational (i.e. the cheapest resource in the datacenter). I think for the binary protocol all we need is to send {database name, user rid, expiry time and hash} - this shouldn't produce too much network overhead. In any case, the advantage is you gain a lot more horizontal scalability because across a cluster of servers there will be no session state to manage and can securely multiplex TCP connections. Overall I think this should be a great net benefit?

phpnode commented 9 years ago

@emrul absolutely! this is a great solution, just want to make sure the binary protocol is taken into consideration because it will really benefit from this change. It's useful in the REST api too of course, but that already supports sessions to a certain extent, the binary protocol really needs it.

emrul commented 9 years ago

Understood. For my part I'll try to make the token format agnostic so it can support a slimmer binary format as well as JWT. I think that will give us the best hope getting the binary driver enhanced.

rajohn96 commented 9 years ago

emrul, outstanding; great turnaround from concept to code!! WRT binary protocol, as it is not stateless (right?) then what is the need for the repeated exchange of the token, as the connection is not shared? Of course there is authentication/authorization when the connection is established, but beyond the session id, what more is needed after that?