pyeventsourcing / eventsourcing

A library for event sourcing in Python.
https://eventsourcing.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.41k stars 130 forks source link

Question: How to best integrate with token based authentication for Postgres #273

Closed dom111 closed 1 week ago

dom111 commented 1 month ago

Hi there,

Firstly, thanks for the hard work of the team on this well-documented and richly-featured package!

We're currently utilising pyeventsourcing in a project and are investigating how to best implement token-based passwords for the Postgres persistence module.

Current options we've considered are:

I'd be happy to contribute any fixes you'd find useful back to the project, so any guidance on alternatives, or preferences from the above would be great.

Thanks!

Dom

johnbywater commented 1 month ago

Hi @dom111!

Thanks for asking about this, and for your kind remarks about the library!

It's true that the library currently only supports one kind of authentication. But it would be good to support other kinds.

I don't fully understand how token based authentication works with Postgres. So I don't really know what to suggest as a solution. Is there some documentation about this online?

If you would like to have a quick meeting to go over what you are looking to achieve, that would be fun.

All good wishes,

John

johnbywater commented 1 month ago

PS I found this (but it doesn't mention "token-based"): https://www.postgresql.org/docs/current/client-authentication.html

I also found this (which does mention "token-based" but it's about pgadmin): https://www.postgresql.org/about/news/pgadmin-4-v615-released-2531/

krixon commented 1 month ago

Hi @johnbywater, I am working with @dom111 on a project using pyeventsourcing, I second his comments about the library, it's been great to work with!

Our use case is to authenticate using AWS IAM. This essentially just involves using a short-lived token as the database password, meaning connections which take place after the token has expired will fail to authenticate.

What we ideally need is a mechanism to provide the password dynamically on each connection.

We have some options for working around this such as extending the PostgresConnectionPool, PostgresDatastore, Factory etc, or creating custom Recorder classes which proactively check and refresh the token before any operation and update the password on the pool instance. None of it feels especially neat though.

However If, for example, we could register a function with the connection pool which provides credentials on each connection, or specify the ConnectionPool class via the Factory in the same way you can with the aggregate recorder etc, it feels a bit cleaner.

So if there's a better way to do this that we've missed, would be great to hear it! But if not and you're open to a change in the library. we're happy to put together a PR, but keen to get some guidance on the best approach.

Thanks for your help!

dom111 commented 1 month ago

Hi John, thanks for the prompt reply!

Apologies for the confusion around the token Auth, our specific use case is IAM Auth with AWS which provides an expiring token for given credentials that are passed in as the password (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.html) the main concern we had was around using the pool and having a new connection be attempted with a cached password (as the password value from env is stored on class instantiation).

I feel like some sort of password provider mechanism that could be passed in might work, but I don't know if that's in keeping with the design goals of the lib. A solution where the Postgres* classes can be dynamically set to alternative implementations might work well (a pattern used elsewhere, but presumably not for the Postgres module because it's provided as a complete solution as-is).

I'm currently looking at a few solutions as mentioned above, as well as a quick and dirty approach of overwriting the _create_connection method to set self.password to a token before calling super()._create_connection() which appears to be working although is brittle.

It's been great to have something robust and would be happy to help get something in place in the library to share with any future users.

Hope that doc helps clarify the use-case!

Thanks

Dom

johnbywater commented 1 month ago

@dom111 @krixon thanks for the replies!

So, from where do you get the new token? for how long does a token last? what happens when you try to connect using a token that has expired? and what happens if you make a request after a token has expired using a connection that was created with a valid token? would it be better to always check that a token is valid before every request, or might a token expire after the check and before the request is made? so would it better to detect when a request fails due to a token expired error and then reconnect with a new token and retry the request? I'm guessing the last option would work best?

Regarding the scope of this module, the intention is to support what users need. Using IAM seems like a very valid use case. I guess we could add an environment variable, such as POSTGRES_PASSWORD_TOKEN_TOPIC, that is expected to resolve to a callable, from which token can be obtained. Setting this could then activate the error-detection-reconnect-retry behaviour? More generally speaking, more flexibility in overriding components is definitely possible. At the moment, I admit, it's a little bit awkward to introduce alternative components, such as alternative recorders, because you need to subclass the factory and define a new persistence module.

krixon commented 1 month ago

from where do you get the new token?

We call out to AWS via an SDK to obtain new tokens, pretty much as in this linked example.

for how long does a token last?

15 minutes.

what happens when you try to connect using a token that has expired?

The connection is refused as if you had supplied an incorrect password.

what happens if you make a request after a token has expired using a connection that was created with a valid token?

The token is only used to establish the connection, after which the connection can remain open beyond the token's lifetime. So essentially, every time _create_connection is called on the pool, we'd need to provide the current token.

so would it better to detect when a request fails due to a token expired error and then reconnect with a new token and retry the request? I'm guessing the last option would work best?

Yes this feels like the most useful approach. The alternative would be to proactively call the registered function to obtain the password before each connection (rather than waiting for it to fail), but that implies the function knows whether a new token is actually required or not, which may not always be the case. Trying to connect, failing, refreshing the password, trying once more and finally giving up on a second failure seems simplest.

johnbywater commented 1 month ago

I just added this:

As an alternative to setting a fixed password in POSTGRES_PASSWORD, you can set POSTGRES_GET_PASSWORD_TOPIC to indicate the topic of a function that returns passwords. This function will be called each time when creating new database connections. This variable supports using database services authenticated with Identity Access Management (IAM), sometimes referred to as token-based authentication, for which the password is a token that is changed perhaps every 15 minutes. If POSTGRES_GET_PASSWORD_TOPIC is set, the POSTGRES_PASSWORD variable is not required and will be ignored. The value of this variable should be resolvable using resolve_topic() to a Python function that expects no arguments and returns a Python str.

https://eventsourcing.readthedocs.io/en/latest/topics/persistence.html#postgresql-module

I've pre-released these changes as https://pypi.org/project/eventsourcing/9.3.0a1/. It's a pre-release so you may need to mention this version explicitly in order to install it.

Please note, this release uses the new psycopg3 package. Not much else has changed, except that the package is built with Poetry. The eventsourcing.postgres module has been heavily modified to work closely with psycopg3. But you shouldn't notice any difference. However, if your project depends on psycopg separately, then have it depend on psycopg[c] for production use along with eventsourcing. Alternatively, you can depend on eventsourcing[postgres] which will do the same. For development, to avoid longer build times at the cost of slightly worse performance, you can either depend on psycopg[binary] and eventsourcing or eventsourcing[postgres_dev]. These install options are documented here. The "extra" dependency options (eventsourcing[postgres] and eventsourcing[postgres_dev]) aren't new, but eventsourcing v9.3 depends on psycopg v3, and the install options for the psycopg package have changed. I won't bore you here with the details of what's new in psycopg3, as you probably already know about it. However, the psycopg v3 is an almost total rewrite of the psycopg package, and we are using the pipelining functionality to append events, which improves performance in many cases, by avoiding network round trips. This is new code and has been tested by at least one other person :-)

Please could you try this out and let me know your thoughts? I wrote some tests to check the configuration of POSTGRES_GET_PASSWORD_TOPIC and the associated functionality for using a "get password" function. Connection attempts will continue for POSTGRES_CONNECT_TIMEOUT (default is 5s), with each attempt calling the "get password" function. So if you don't want to call out to AWS via an SDK several times a second, for example when starting up, then you could implement some kind of time-based cache in your "get password" function, I guess an expiry time that is less than the connect timeout value would be sensible here. Let me know what you think?

dom111 commented 1 month ago

@johnbywater This is great and looks to be exactly what we need. Thanks for working on this, and thanks for working on it so quickly! My focus is on another part of the project presently, but I'll get onto this ASAP and validate the mechanism, hopefully this week.

I've taken note of your comments so some sort of short-term caching should work perfectly.

Thanks again!

johnbywater commented 1 month ago

@dom111 @krixon did you have any further thoughts about this?

krixon commented 1 month ago

Hi @johnbywater, apologies for the delay, we've been focused on getting an MVP released so have been a bit snowed under the last couple of weeks! We're hoping to pick this up over the next few days so will report back ASAP. Thanks for your patience!

dom111 commented 3 weeks ago

@johnbywater just an update that I'm looking into this presently. I'm hitting some weirdness after updating the lib, so I'm just looking into the details to make sure it's not just something we're doing wrong!

johnbywater commented 2 weeks ago

Hi @dom

hitting some weirdness

Did you have any more details about this that you can share? :-)

dom111 commented 2 weeks ago

Hey @johnbywater, I got to the bottom of this (it turned out to be some self-inflicted weirdness on my local dev setup with volumes in docker-compose) and I've changed most things (had to update our alembic and sqlalchemy config to use postgresql+psycopg (rather than omitting the psycopg part altogether, or specifying psycopg2).

My last issue to work through is an end-to-end test where we can prove data is written in (preferably, at least) our staging environment.

I appreciate that we don't want to block you on any releases so we are keeping this on our board.

Thanks again for this work!

johnbywater commented 2 weeks ago

Thanks @dom111 that's very helpful.

dom111 commented 1 week ago

Hey @johnbywater,

Hope you're doing well! Just to confirm that this appears to work perfectly in my tests. Thank you again for your patience in this and apologies that it's taken a while for us to get this checked.

Please let me know if you need anything further (or have any more versions that you'd like to test) otherwise, we're excited for 9.3's release and getting this change out to production!

Thanks,

Dom

johnbywater commented 1 week ago

Aww thanks @dom111! That's really very helpful. I really appreciate the efforts you have made to test this. Thanks for letting me know it's all working!