Token-based authentication for HTTP targets

lenn4rd commented 1 year ago

We set up Snowbridge as a PoC to read from Kinesis and write to an HTTP endpoint that's some middleware for a streaming platform. This API requires HTTP header authorization and so we first acquire the access token, then set the Authorization header in the config file via an ENV variable file, then start Snowbridge.

The API uses short-lived tokens so we need to refresh the token regularly before it expires so the stream doesn't break. I have cobbled together a solution that piggybacks on Docker's health checks to refresh the token before it expires and restarts the process when it fails because the token expired.

It works but admittedly is hacky. We're open to contributing code and I'm wondering if you have plans for token-based authentication already. I'm thinking adding token refresh logic would maybe be too specific to our use case but having the option to request a token from credentials and pass it to the HTTP client would already make everything much easier. Curious to hear your thoughts.

stanch commented 1 year ago

Hey @lenn4rd, excited for what could be our first external contribution to Snowbridge! :)

Auth with short-lived tokens sounds like a useful feature. Do you have more details on the specific app you are streaming to, or maybe just how its auth works in general?

An approach we’ve seen in the past would be something like:

Generate a private/public key pair
Configure the external app with the public key
Generate a new JWT with each request (or every 5 min) signed with the private key, with a time to live of, say, 5 min

It seems you are describing a different strategy. Am I understanding it correctly?

Fetch a token from a separate API endpoint using static credentials
Cache the token for 5 minutes
Attach the cached token to every request

If that’s correct, our questions would be:

Aren’t the advantages of short-lived tokens negated by long-lived credentials?
Is this a general pattern for several apps, or specific to the app in question (which is where more details on the app could help)?

lenn4rd commented 1 year ago

Hey @stanch, haha, same here! Let me give more context and then we see how well it fits your product vision.

We're using a third-party product that's currently in private beta and I'm not sure how much I'm allowed to share about their API. I won't give names but I can describe the endpoints and what kind of data they expect.

The process flow you described is correct. The API implements an OAuth authentication workflow and we have a client ID and client secret which we send against an auth endpoint, along with the scope. A curl call looks like this:

curl -X POST https://thirdpartyapi.com/oauth2/token \
     -H 'Content-Type: application/x-www-form-urlencoded; charset=utf-8' \
     --data-urlencode "grant_type=client_credentials" \
     --data-urlencode "client_id=<some string>" \
     --data-urlencode "client_secret=<some string>" \
     --data-urlencode "scope=myapp:edit"

This endpoint returns an access token that's valid for 1 hour and no refresh token i.e. we call above endpoint to request a fresh access token.

I think the company would be open to our feedback but I'm not sure if they'd be willing to build more authentication flows, e.g. using JWTs.

We then pass the access token in an HTTP header to Snowbridge and configured an http target and a light transformation to massage the data into what the second API endpoint expects to ingest data:

# in custom entrypoint
export ENDPOINT_URL=...
export HTTP_HEADERS="Authorization: Bearer <access token>"

# in config.hcl
target {
  use "http" {
    url     = env.ENDPOINT_URL
    headers = env.HTTP_HEADERS
  }
}

# in transform.js
function main(input) {
    // Expected data input by API endpoint:
    // JSON string: [{"value": {"my_input_field": JSON string}}]
    //
    // Maybe we don't need the nested JSON but it was the safest for this experiment
    var row = { my_input_field: JSON.stringify(input.Data) }

    return {
        Data: JSON.stringify([{ value: row }])
    };
}

Right now this is specific to this app and I don't see other use cases for forwarding to HTTP endpoints for us but never say never.

Most APIs I worked with use the authorization header though und usually the flow is very similar if not identical to the example above.

I agree that this flow of creating short-lived tokens by means of long-lived credentials deployed with the app could create a false sense of security but unless I'm missing something the same is true for private/public key pairs. Not an authentication expert though. 😬

Are these examples useful? Anything that's missing? Happy to share examples from the Docker container workaround if that helps.

colmsnowplow commented 1 year ago

Hey @lenn4rd thanks for raising the issue and descibing the setup. Definitely open to working together on something.

Would you be able to give me a picture of how you imagine it would be implemented? Specifically how would requesting a token from credentials work? Would that need to be specific to the platform's credentials provider (eg ssm), or is there some way it can be platform agnostic in the codebase?

(If you're not sure yet, no worries! We can figure it out together here.)

lenn4rd commented 1 year ago

Hey @colmsnowplow, sure! Happy to outline what I have in mind, your feedback is appreciated. I'll focus on our specific use case but I'm 99% certain it'll work for other APIs provided they comply with the OAuth protocol.

Our third-party service is using an OAuth 2 authentication flow using the client credentials grant type, i.e. we have client ID and client secret. Going through the OAuth specs, acquiring them is part of the OAuth flow but I consider out of scope here and assume we acquired them as step 0 during setup.

BTW I found Digital Ocean did a great job of writing a concise summary of various OAuth methods.

I'll now pass on these credentials to Snowbridge, either in the config file or via ENV variables but I saw in another issue you plan to abandon the configuration via ENV variables path. I like the idea of using SSM on AWS or its GCP equivalant but good ol' ENV variables in a container are a good starting point.

Here's how the config could look like:

target {
  use "http" {
    url = env.ENDPOINT_URL

    # Either we nest this block in the http one or we move it up one level, i.e.
    # it becomes sibling and is called middleware style with use "authentication", too.
    #
    # I don't know if Snowbridge supports multiple targets and will fan out the data.
    # If so, nesting would be more explicit and clearer.
    authentication {
      url           = "https://thirdpartyapi.com/oauth2/token"
      grant_type    = "client_credentials"
      client_id     = env.CLIENT_ID
      client_secret = env.CLIENT_SECRET

      # There will be some quirky APIs, so let's add on option to make them happy
      headers = ""
    }
  }
}

During boot, the Snowbridge HTTP client sees we configured an authentication block and needs to call the token endpoint regularly to request the token. The OAuth protocol defines a token refresh mechanism which I'll leave out for the first iteration.

The token endpoint usually returns how long the token will be valid. The Snowbridge HTTP client should keep track of this expiry and call this endpoint regularly to get a new access token. These calls would run in an application grooming lifecycle, similar to checkpointing maybe.

Each request to the actual endpoint for the http target will carry the access token for authentication, usually by setting an HTTP header:

Authorization: Bearer <access token>

I can see a ton of edge cases for non-standard OAuth APIs and responses, e.g. how to parse the token and its expiry from the response. Our third-party service uses access_token and expires_in in its JSON response but I don't know if that's part of the OAuth specs.

colmsnowplow commented 1 year ago

Thanks for explaining @lenn4rd,

I think I follow you now.

So the part I'm not terribly keen on is that it would require changing a pattern that's general to all targets - at the moment we create the client on boot up, and from then on all the target-specific code is self-contained and just deals with sending the data (from within a goroutine).

I'm not against making changes like that but generally would prefer not to have a target-specific deviation if it can be avoided. At a glance, this oauth2 package looks like it might just handle this all for us under the hood.

Do you think we could achieve what you're after using this package, or something similar?

I'd be open to just replacing the client we're currently using with this, provided it maintains feature parity and can be used without oauth (which looks possible). Ofc also provided it's threadsafe and behaves as we want it to. :)

lenn4rd commented 1 year ago

Apologies for the late response, @colmsnowplow. What you said about avoiding target-specific changes does make a lot of sense, plus the oauth2 package indeed looks like a good abstraction and promising candidate. After all this service we're using is using two-legged OAuth which is supported by the package.

I'll experiment with adding the package and adjusting the internal HTTP client as necessary. Is the thread-safety already covered by tests or what would be a good way to assess it?

colmsnowplow commented 1 year ago

No hassle @lenn4rd - I assumed you're as busy as the rest of us!

Glad to hear, I'm definitely on board for supporting OAuth2.

I think I didn't express the thought terribly well/specifically around threadsafety. Let me try to do that now:

What I'm thinking about here is specifically the behaviour of the client when it comes to token rotation. In the app, the client is created, then it's used in multiple concurrent goroutines to send data.

I haven't dived terribly deep into how this package works, and basically I meant to flag that fact - because I'd hate to suggest a package, see a PR, then realise that it's not fit for purpose after you've spent time on it. :)

A few things I'd be looking to understand here:

When a token rotates, what happens to processes that are still using the old token? My assumption is that probably the old token still works for some period of time, and so there's no issue - but before merging the feature I'd want to validate that assumption. (Note that it's not necessarily the end of the world if those events fail because of the token rotation, they'll get retried - but I'd want to think about how we could avoid that).
How is a token rotated? My gut says it's either triggered in the client, by some background process (and so would just happen once per expiry, and would get propagated to all processes that start from then on), or it's triggered by an attempt to use the old token after it expires (in which case we might have a race that we need to think about).

I imagine it's designed to play nicely and neither of these are likely actually a problem, just making it explicit that those are the assumptions I'd want to validate.

As far as proving out/discovering/testing for them, I think I'd be satisfied with any of:

Documentation for that package that describes it working in a way that fits our model
A simple reproduction that confirms the behaviour fits the model
An integration test in this app which reproduces this scenario and works as expected.

But I don't mean to suggest that this is something you'd have to do for me to accept a PR - especially if it's troublesome to do. I just brought it up to ensure that I'm explicit about what I do/don't know about the package I suggested, and to flag early that in a review I might need to ask your patience while I find time to prove it out myself, if it's not sufficiently clear.

Hope this all makes sense, do follow up if this or anything else needs clarification. :)

colmsnowplow commented 2 months ago

We added OAuth2 support recently, which I think solves this one

snowplow / snowbridge

Token-based authentication for HTTP targets #270