Open dunindd opened 5 years ago
Setting OKTA_BROWSER_AUTH
to false
yields a bit of a speed improvement, but it is still significantly slower than a naive implementation.
I did a quick and dirty implementation in Java as well, and while it is slower than the Python version, it is still much faster than the current withOkta
implementation:
$ time java -jar GetStsToken.jar
real 0m3.007s
user 0m1.761s
sys 0m0.470s
This could probably be sped up a bit using the actual AWS Java SDK rather than the AWS CLI as a subprocess, but I think it still demonstrates the point:
package org.example;
import java.io.File;
import java.io.InputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.text.MessageFormat;
import java.util.stream.Stream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class GetStsToken {
public static void main(String[] args) throws Exception {
sendGet();
}
private static void sendGet() throws Exception {
String url = "https://company.okta.com/app/amazon_aws/abc1234567890exz/sso/saml";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
String cookes_file_path = System.getProperty("user.home") + File.separator + ".okta" + File.separator
+ "cookies.properties";
try (Stream<String> stream = Files.lines(Paths.get(cookes_file_path))) {
stream.forEach((String l) -> con.addRequestProperty("Cookie", l));
}
// optional default is GET
con.setRequestMethod("GET");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
String resp = response.toString();
String START_STR = "<input name=\"SAMLResponse\" type=\"hidden\" value=\"";
String END_STR = "\"/>";
String saml_response = response.substring(resp.indexOf(START_STR));
saml_response = saml_response.substring(START_STR.length(), saml_response.indexOf(END_STR));
saml_response = saml_response.replaceAll("+", "-");
saml_response = saml_response.replaceAll("=", "=");
Process processRun = null;
try {
String aws_cli = MessageFormat.format(
"aws sts assume-role-with-saml --role-arn {0} --principal-arn {1} --saml-assertion {2}",
"arn:aws:iam::1234567890:role/MY_ROLE", "arn:aws:iam::1234567890:saml-provider/Okta",
saml_response);
// processRun = Runtime.getRuntime().exec(aws_cli);
Process p = Runtime.getRuntime().exec(aws_cli);
final InputStream stream = p.getInputStream();
new Thread(new Runnable() {
public void run() {
BufferedReader reader = null;
try {
reader = new BufferedReader(new InputStreamReader(stream));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
// TODO
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
// ignore
}
}
}
}
}).start();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
This is interesting. I have been trying to figure out performance regressions in this for a while. I think you are on to something. Thank you.
It’s possible that the JSoup piece (parsing the HTML) is slow. It’s definitely heavier than indexOf and substring. I prefer using a parser and rules from SAML 2.0 itself to landmarking.
The other piece is the performance of the AWS SDK for Java. It’s possible that AssumeRoleWithSaml is faster in boto3.
I think this tool also parses the SAMLResponse prior to running AssumeRoleWithSAML. That is definitely not needed once the role ARN is known.
I added some timing tracking locally and got some potentially interesting results.
With OKTA_BROWSER_AUTH=false: 4.5 seconds (credentialsLoaded - start) preApp=2019-01-03T21:38:23.237Z cookiesLoaded=2019-01-03T21:38:23.280Z postApp=2019-01-03T21:38:26.216Z cookiesStored=2019-01-03T21:38:26.222Z docParsed=2019-01-03T21:38:26.273Z preSts=2019-01-03T21:38:26.292Z postSts=2019-01-03T21:38:28.157Z preInit=2019-01-03T21:38:23.092Z postInit=2019-01-03T21:38:23.235Z sessionChecked=2019-01-03T21:38:23.236Z samlResponse=2019-01-03T21:38:28.158Z start=2019-01-03T21:38:22.506Z envLoaded=2019-01-03T21:38:23.088Z credentialsLoaded=2019-01-03T21:38:28.158Z
With OKTA_BROWSER_AUTH=true: 6.6 seconds (credentialsLoaded - start) preLaunch=2019-01-03T21:21:36.540Z jfxReady=2019-01-03T21:21:38.184Z browserReady=2019-01-03T21:21:39.066Z responseReady=2019-01-03T21:21:40.701Z preSts=2019-01-03T21:21:40.710Z postSts=2019-01-03T21:21:42.516Z preInit=2019-01-03T21:21:36.430Z postInit=2019-01-03T21:21:36.527Z sessionChecked=2019-01-03T21:21:36.528Z samlResponse=2019-01-03T21:21:42.518Z start=2019-01-03T21:21:35.932Z envLoaded=2019-01-03T21:21:36.427Z credentialsLoaded=2019-01-03T21:21:42.519Z
Biggest Overheads: JavaFX appears to add 2 seconds locally. (jfxReady - preLaunch) Loading environment appears to add 0.5 seconds locally. (endLoaded - start) Everything else is sub-100ms.
Inherent costs: Okta via this code takes around 1.6 seconds locally. (postApp - preApp) STS via the Java API seems to take 1.8 seconds locally. (postSts - preSts)
I'm really surprised how expensive loading the environment is. It's much more expensive than loading cookies for example, which surprises me.
There's definitely room for improvement here.
The biggest improvement would be intelligent caching like mentioned in #256. It would make a current instant versus expiry check required.
I think I have exposed the slow pieces of this code. I don't have time to optimize this, but I welcome contributions. @dunindd if you are satisfied, please close this issue.
So, I definitely appreciate that there is some startup time but even if I use a DOM Parser the naive implementation in Python is still quite quick:
$ time pipenv run python2 get_sts_token.1.py
real 0m2.163s
user 0m0.541s
sys 0m0.190s
The changes were to replace the substring method with Beautifulsoup:
soup = BeautifulSoup("".join(resp.readlines()), 'html.parser')
saml_response = soup.find(attrs={'name':'SAMLResponse'})['value']
This is only 40ms slower than using the naive substring method.
@AlainODea how would improved credential caching address this issue? At best wouldn't it only help with subsequent calls to credential_process? This is something which is addressable now by reading from ~/.aws/credentials
(for the actual credentials) and ~/.okta/profiles
(for the expiration timestamp).
I sorted the timestamps you gave by time, so they're a bit easier to follow:
OKTA_BROWSER_AUTH=False
start =2019-01-03T21:38:22.506Z
envLoaded =2019-01-03T21:38:23.088Z
preInit =2019-01-03T21:38:23.092Z
postInit =2019-01-03T21:38:23.235Z
sessionChecked =2019-01-03T21:38:23.236Z
preApp =2019-01-03T21:38:23.237Z
cookiesLoaded =2019-01-03T21:38:23.280Z
postApp =2019-01-03T21:38:26.216Z
cookiesStored =2019-01-03T21:38:26.222Z
docParsed =2019-01-03T21:38:26.273Z
preSts =2019-01-03T21:38:26.292Z
postSts =2019-01-03T21:38:28.157Z
samlResponse =2019-01-03T21:38:28.158Z
credentialsLoaded=2019-01-03T21:38:28.158Z
OKTA_BROWSER_AUTH=True
start =2019-01-03T21:21:35.932Z
envLoaded =2019-01-03T21:21:36.427Z
preInit =2019-01-03T21:21:36.430Z
postInit =2019-01-03T21:21:36.527Z
sessionChecked =2019-01-03T21:21:36.528Z
preLaunch =2019-01-03T21:21:36.540Z
jfxReady =2019-01-03T21:21:38.184Z
browserReady =2019-01-03T21:21:39.066Z
responseReady =2019-01-03T21:21:40.701Z
preSts =2019-01-03T21:21:40.710Z
postSts =2019-01-03T21:21:42.516Z
samlResponse =2019-01-03T21:21:42.518Z
credentialsLoaded=2019-01-03T21:21:42.519Z
So it looks like in the case of Browser Auth, the bulk of the lag is in start up time. But in the non-browser version, there is still significant delay between cookiesLoaded
and postApp
. It's almost 3 seconds, for just a GET request.
I'm also a little confused as to why we would close the issue without solving the problem yet. Should we not leave it open until a PR resulting in a fix is merged?
Caching AWS credentials would mean paying the overheads only when credentials expire. It would make the amortized overhead very low. The caching could be smart enough to work across multiple role ARNs and multiple orgs which would benefit withokta use as well.
What is your use case that is impacted by the existing overhead? Would it be helped by caching?
Well, to be honest, most of the impacted use cases which we were having have been resolved with 1.0.6, specifically when the addition of ListRoles
(I was parsing the menu output of withOkta prior to this and that was excruciatingly slow). With ListRoles
that issue pretty much went away and I have a custom Python script to handle credential caching (using data from ~/.aws/credentials
and ~/.okta/profiles
, repopulating using withOkta
when necessary). That overhead is minimal with cached credentials (~1000ms) and without cached credentials we have a bit more of a delay (~12000ms).
I suppose the whole point of this Issue was exploratory. It seems strange to me that there is so much more overhead here than a naive implementation. I've been profiling the code and making small adjustments where I could without much success, though. It seems that, at a minimum, we're looking at about 1500ms for a GET request using either standard libraries or Apache's. Which is unfortunate.
I've also been doing most of my work in the ListRoles flow, so I haven't gotten to the point where I understand where the slowdowns are happening in OktaAwsCliAssumeRole (which is where this issue is originally directed), but in that time I did find something interesting:
..AwsSamlRoleUtils.getRoles(samlResponse) 739ms
..AwsSamlRoleUtils.getSigninPageDocument(samlResponse) 1301ms
..AwsSamlSigninParser.parseAccountOptions(document) 7ms
getAvailableRoles 2047ms
Rather strange that parsing the SAML response is taking that long, too. Maybe there are some gains to be had with the AssumeRole flow w.r.t. parsing as well?
But yes, I suppose caching would help mitigate the original issue.
I did not realize that getAvailableRoles was so expensive. I knew XML parsers were expensive but that is wild, especially for such a small document. The HTML parser runs reasonably fast (~50ms).
Thank you for the analysis here. Hopefully some optimization will reveal itself.
I’m going to do some better profiling with something more sophisticated than println. I think you are using method tracing, which looks very helpful.
Describe the bug Calling
withOkta
whilecookies.properties
is present is 3-7x slower than a naive implementation. I have also noticed thatcom.okta.tools.authentication.BrowserAuthentication.login()
seems to be a bit slower thancom.okta.tools.saml.OktaSaml.getSamlResponseForAwsRefresh()
, but I don't quite understand why. Ideally, they would be using the same code whencookies.properties
is present and valid. A browser only needs to be opened if the cookie values are stale or a user needs to refresh their session.To Reproduce
Expected behavior I would expect the Okta SAML implementation to be as fast, or faster than the Python implementation.
Additional context