Why is Okta SAML so slow compared to a naive implementation?

oktadev / okta-aws-cli-assume-role

Okta AWS CLI Assume Role Tool

Apache License 2.0

338 stars 177 forks source link

Why is Okta SAML so slow compared to a naive implementation? #257

Open dunindd opened 5 years ago

dunindd commented 5 years ago

Describe the bug Calling withOkta while cookies.properties is present is 3-7x slower than a naive implementation. I have also noticed that com.okta.tools.authentication.BrowserAuthentication.login() seems to be a bit slower than com.okta.tools.saml.OktaSaml.getSamlResponseForAwsRefresh(), but I don't quite understand why. Ideally, they would be using the same code when cookies.properties is present and valid. A browser only needs to be opened if the cookie values are stale or a user needs to refresh their session.

To Reproduce

$ time okta-aws my-profile sts get-caller-identity
real    0m9.853s
user    0m6.332s
sys    0m0.470s

$ time pipenv run python2 get_sts_token.py 
real    0m1.911s
user    0m0.507s
sys    0m0.224s

Expected behavior I would expect the Okta SAML implementation to be as fast, or faster than the Python implementation.

Additional context

import boto3
import botocore
import os
import datetime
import json
import urllib2
import re
import HTMLParser
from subprocess import call

def main():
    cookes_file_path = '{0}/.okta/cookies.properties'.format(os.path.expanduser('~'))
    saml_re = re.compile(r'^\s*<input name="SAMLResponse"')
    html_parser = HTMLParser.HTMLParser()
    role_arn = 'arn:aws:iam::1234567890:role/MY_ROLE'
    principal_arn = 'arn:aws:iam::1234567890:saml-provider/Okta'
    region = 'us-east-1'

    with open(cookes_file_path, "r") as cookies_file:
        lines = cookies_file.readlines()
        opener = urllib2.build_opener()

        cookies = [l.replace('\n', '') for l in filter(lambda x: x.find('#') == -1, lines)]

        opener.addheaders.append(('Cookie', '; '.join(cookies)))

        resp = opener.open('https://company.okta.com/app/amazon_aws/abc1234567890exz/sso/saml')

        if resp.getcode() == 200:
            dom_lines = resp.readlines()
            START_STR = 'value="'
            END_STR = '"/>'
            saml_response_line = filter(lambda x: saml_re.match(x), dom_lines)[0].replace('\n','')
            saml_response = html_parser.unescape(saml_response_line[saml_response_line.find(START_STR)+len(START_STR):saml_response_line.find(END_STR)])

            bs = botocore.session.get_session({ 'profile': ( None, ['', ''], None, None ) })
            bs.set_credentials('','','')
            s = boto3.session.Session(botocore_session = bs)
            client = s.client('sts', region_name = region)

            assumed_role = client.assume_role_with_saml(RoleArn=role_arn, PrincipalArn=principal_arn, SAMLAssertion=saml_response)

            bs = botocore.session.get_session({ 'profile': ( None, ['', ''], None, None ) })
            bs.set_credentials(assumed_role['Credentials'][u'AccessKeyId'],
                               assumed_role['Credentials'][u'SecretAccessKey'],
                               assumed_role['Credentials'][u'SessionToken'])
            s = boto3.session.Session(botocore_session = bs)
            client = s.client('sts', region_name = region)

            print(client.get_caller_identity())
        else:
            call(['get-okta-sts-token', '-r MY-PROFILE'])

if __name__ == "__main__":
    main()

dunindd commented 5 years ago

Setting OKTA_BROWSER_AUTH to false yields a bit of a speed improvement, but it is still significantly slower than a naive implementation.

dunindd commented 5 years ago

I did a quick and dirty implementation in Java as well, and while it is slower than the Python version, it is still much faster than the current withOkta implementation:

$ time java -jar GetStsToken.jar
real    0m3.007s
user    0m1.761s
sys 0m0.470s

This could probably be sped up a bit using the actual AWS Java SDK rather than the AWS CLI as a subprocess, but I think it still demonstrates the point:

package org.example;

import java.io.File;
import java.io.InputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.text.MessageFormat;
import java.util.stream.Stream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class GetStsToken {
    public static void main(String[] args) throws Exception {
        sendGet();
    }

    private static void sendGet() throws Exception {

        String url = "https://company.okta.com/app/amazon_aws/abc1234567890exz/sso/saml";

        URL obj = new URL(url);

        HttpURLConnection con = (HttpURLConnection) obj.openConnection();

        String cookes_file_path = System.getProperty("user.home") + File.separator + ".okta" + File.separator
                + "cookies.properties";

        try (Stream<String> stream = Files.lines(Paths.get(cookes_file_path))) {
            stream.forEach((String l) -> con.addRequestProperty("Cookie", l));
        }

        // optional default is GET
        con.setRequestMethod("GET");

        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();

        String resp = response.toString();
        String START_STR = "<input name=\"SAMLResponse\" type=\"hidden\" value=\"";
        String END_STR = "\"/>";

        String saml_response = response.substring(resp.indexOf(START_STR));
        saml_response = saml_response.substring(START_STR.length(), saml_response.indexOf(END_STR));
        saml_response = saml_response.replaceAll("&#x2b;", "-");
        saml_response = saml_response.replaceAll("&#x3d;", "=");

        Process processRun = null;
        try {
            String aws_cli = MessageFormat.format(
                    "aws sts assume-role-with-saml --role-arn {0} --principal-arn {1} --saml-assertion {2}",
                    "arn:aws:iam::1234567890:role/MY_ROLE", "arn:aws:iam::1234567890:saml-provider/Okta",
                    saml_response);
            // processRun = Runtime.getRuntime().exec(aws_cli);

            Process p = Runtime.getRuntime().exec(aws_cli);
            final InputStream stream = p.getInputStream();
            new Thread(new Runnable() {
                public void run() {
                    BufferedReader reader = null;
                    try {
                        reader = new BufferedReader(new InputStreamReader(stream));
                        String line = null;
                        while ((line = reader.readLine()) != null) {
                            System.out.println(line);
                        }
                    } catch (Exception e) {
                        // TODO
                    } finally {
                        if (reader != null) {
                            try {
                                reader.close();
                            } catch (IOException e) {
                                // ignore
                            }
                        }
                    }
                }
            }).start();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}

AlainODea commented 5 years ago

This is interesting. I have been trying to figure out performance regressions in this for a while. I think you are on to something. Thank you.

AlainODea commented 5 years ago

It’s possible that the JSoup piece (parsing the HTML) is slow. It’s definitely heavier than indexOf and substring. I prefer using a parser and rules from SAML 2.0 itself to landmarking.

The other piece is the performance of the AWS SDK for Java. It’s possible that AssumeRoleWithSaml is faster in boto3.

I think this tool also parses the SAMLResponse prior to running AssumeRoleWithSAML. That is definitely not needed once the role ARN is known.

AlainODea commented 5 years ago

I added some timing tracking locally and got some potentially interesting results.

With OKTA_BROWSER_AUTH=false: 4.5 seconds (credentialsLoaded - start) preApp=2019-01-03T21:38:23.237Z cookiesLoaded=2019-01-03T21:38:23.280Z postApp=2019-01-03T21:38:26.216Z cookiesStored=2019-01-03T21:38:26.222Z docParsed=2019-01-03T21:38:26.273Z preSts=2019-01-03T21:38:26.292Z postSts=2019-01-03T21:38:28.157Z preInit=2019-01-03T21:38:23.092Z postInit=2019-01-03T21:38:23.235Z sessionChecked=2019-01-03T21:38:23.236Z samlResponse=2019-01-03T21:38:28.158Z start=2019-01-03T21:38:22.506Z envLoaded=2019-01-03T21:38:23.088Z credentialsLoaded=2019-01-03T21:38:28.158Z

With OKTA_BROWSER_AUTH=true: 6.6 seconds (credentialsLoaded - start) preLaunch=2019-01-03T21:21:36.540Z jfxReady=2019-01-03T21:21:38.184Z browserReady=2019-01-03T21:21:39.066Z responseReady=2019-01-03T21:21:40.701Z preSts=2019-01-03T21:21:40.710Z postSts=2019-01-03T21:21:42.516Z preInit=2019-01-03T21:21:36.430Z postInit=2019-01-03T21:21:36.527Z sessionChecked=2019-01-03T21:21:36.528Z samlResponse=2019-01-03T21:21:42.518Z start=2019-01-03T21:21:35.932Z envLoaded=2019-01-03T21:21:36.427Z credentialsLoaded=2019-01-03T21:21:42.519Z

Biggest Overheads: JavaFX appears to add 2 seconds locally. (jfxReady - preLaunch) Loading environment appears to add 0.5 seconds locally. (endLoaded - start) Everything else is sub-100ms.

Inherent costs: Okta via this code takes around 1.6 seconds locally. (postApp - preApp) STS via the Java API seems to take 1.8 seconds locally. (postSts - preSts)

I'm really surprised how expensive loading the environment is. It's much more expensive than loading cookies for example, which surprises me.

There's definitely room for improvement here.

AlainODea commented 5 years ago

The biggest improvement would be intelligent caching like mentioned in #256. It would make a current instant versus expiry check required.

I think I have exposed the slow pieces of this code. I don't have time to optimize this, but I welcome contributions. @dunindd if you are satisfied, please close this issue.

dunindd commented 5 years ago

So, I definitely appreciate that there is some startup time but even if I use a DOM Parser the naive implementation in Python is still quite quick:

$ time pipenv run python2 get_sts_token.1.py 

real    0m2.163s
user    0m0.541s
sys 0m0.190s

The changes were to replace the substring method with Beautifulsoup:

            soup = BeautifulSoup("".join(resp.readlines()), 'html.parser')

            saml_response = soup.find(attrs={'name':'SAMLResponse'})['value']

This is only 40ms slower than using the naive substring method.

@AlainODea how would improved credential caching address this issue? At best wouldn't it only help with subsequent calls to credential_process? This is something which is addressable now by reading from ~/.aws/credentials (for the actual credentials) and ~/.okta/profiles (for the expiration timestamp).

I sorted the timestamps you gave by time, so they're a bit easier to follow: OKTA_BROWSER_AUTH=False

start            =2019-01-03T21:38:22.506Z
envLoaded        =2019-01-03T21:38:23.088Z
preInit          =2019-01-03T21:38:23.092Z
postInit         =2019-01-03T21:38:23.235Z
sessionChecked   =2019-01-03T21:38:23.236Z
preApp           =2019-01-03T21:38:23.237Z
cookiesLoaded    =2019-01-03T21:38:23.280Z
postApp          =2019-01-03T21:38:26.216Z
cookiesStored    =2019-01-03T21:38:26.222Z
docParsed        =2019-01-03T21:38:26.273Z
preSts           =2019-01-03T21:38:26.292Z
postSts          =2019-01-03T21:38:28.157Z
samlResponse     =2019-01-03T21:38:28.158Z
credentialsLoaded=2019-01-03T21:38:28.158Z

OKTA_BROWSER_AUTH=True

start            =2019-01-03T21:21:35.932Z
envLoaded        =2019-01-03T21:21:36.427Z
preInit          =2019-01-03T21:21:36.430Z
postInit         =2019-01-03T21:21:36.527Z
sessionChecked   =2019-01-03T21:21:36.528Z
preLaunch        =2019-01-03T21:21:36.540Z
jfxReady         =2019-01-03T21:21:38.184Z
browserReady     =2019-01-03T21:21:39.066Z
responseReady    =2019-01-03T21:21:40.701Z
preSts           =2019-01-03T21:21:40.710Z
postSts          =2019-01-03T21:21:42.516Z
samlResponse     =2019-01-03T21:21:42.518Z
credentialsLoaded=2019-01-03T21:21:42.519Z

So it looks like in the case of Browser Auth, the bulk of the lag is in start up time. But in the non-browser version, there is still significant delay between cookiesLoaded and postApp. It's almost 3 seconds, for just a GET request.

I'm also a little confused as to why we would close the issue without solving the problem yet. Should we not leave it open until a PR resulting in a fix is merged?

AlainODea commented 5 years ago

Caching AWS credentials would mean paying the overheads only when credentials expire. It would make the amortized overhead very low. The caching could be smart enough to work across multiple role ARNs and multiple orgs which would benefit withokta use as well.

What is your use case that is impacted by the existing overhead? Would it be helped by caching?

dunindd commented 5 years ago

Well, to be honest, most of the impacted use cases which we were having have been resolved with 1.0.6, specifically when the addition of ListRoles (I was parsing the menu output of withOkta prior to this and that was excruciatingly slow). With ListRoles that issue pretty much went away and I have a custom Python script to handle credential caching (using data from ~/.aws/credentials and ~/.okta/profiles, repopulating using withOkta when necessary). That overhead is minimal with cached credentials (~1000ms) and without cached credentials we have a bit more of a delay (~12000ms).

I suppose the whole point of this Issue was exploratory. It seems strange to me that there is so much more overhead here than a naive implementation. I've been profiling the code and making small adjustments where I could without much success, though. It seems that, at a minimum, we're looking at about 1500ms for a GET request using either standard libraries or Apache's. Which is unfortunate.

I've also been doing most of my work in the ListRoles flow, so I haven't gotten to the point where I understand where the slowdowns are happening in OktaAwsCliAssumeRole (which is where this issue is originally directed), but in that time I did find something interesting:

..AwsSamlRoleUtils.getRoles(samlResponse) 739ms
..AwsSamlRoleUtils.getSigninPageDocument(samlResponse) 1301ms
..AwsSamlSigninParser.parseAccountOptions(document) 7ms
getAvailableRoles 2047ms

Rather strange that parsing the SAML response is taking that long, too. Maybe there are some gains to be had with the AssumeRole flow w.r.t. parsing as well?

But yes, I suppose caching would help mitigate the original issue.

AlainODea commented 5 years ago

I did not realize that getAvailableRoles was so expensive. I knew XML parsers were expensive but that is wild, especially for such a small document. The HTML parser runs reasonably fast (~50ms).

Thank you for the analysis here. Hopefully some optimization will reveal itself.

I’m going to do some better profiling with something more sophisticated than println. I think you are using method tracing, which looks very helpful.