obahareth / has-my-gsuite-been-pwned

This project allows G Suite admins to quickly check their entire G Suite directory (emails) against haveibeenpwned.com
MIT License
4 stars 0 forks source link

Figure out a way around haveibeenpwned.com rate limits #1

Open obahareth opened 5 years ago

obahareth commented 5 years ago

https://haveibeenpwned.com's rate limits are a major blocker, waiting for them + an added 150ms still doesn't seem to be enough, and there's no simple way to work around them (we'd be asynchronously checking multiple entire G Suite directories, and we'd have to keep a "rate limit state" to make sure we are always using the most recent rate limit). A possible super ugly solution would be to only allow one directory to be checked at at time, therefore we don't have to worry about rate limit state.

sathwikmatsa commented 5 years ago

The API reference says that requests are limited to one per every 1500 milliseconds. That could be the issue! https://haveibeenpwned.com/API/v2#RateLimiting

obahareth commented 5 years ago

Thanks @sathwikmatsa, I've previously tried that as well as waiting for their Retry-After: value and it didn't work for me. Even if I do get past that, the major issue still remains, I would be checking multiple G Suite directories at a time (e.g. admin of site1.com and site2.com checking multiple emails at the same time), I would need to make sure every single request that's been initiated by a different user waits for both the 1500 millisecond as well as the Retry-After value of the last request across the application.

zpeters commented 5 years ago

I'm interested in looking at this with you. I solved a similar problem with Cisco's api for Webex, though it was a "single user" not multi user like you are describing. Haven't dug into the code yet. Would I need to have a gsuite account to accurately test against? My code for webex is very basic but i can share a gist of that too if it helps for inspiration?

obahareth commented 5 years ago

Hey @zpeters, thanks for wanting to help!

So the way the app works at the moment is that it just shows a button to log into G Suite, and you have to be an admin for it to fetch the directory. However if you use IEx, you could directly use the module HasMyGsuiteBeenPwned.HaveIBeenPwned's check_user_and_sleep function by passing a GSuite User struct to it, without having to deal with GSuite at all.

If it'd help out, maybe it's time I added unit tests to the code. Also, please do share the gist, it could be very helpful 😄.

zpeters commented 5 years ago

@obahareth Cool, i will check this out. Here are the gists for the method i came up with for the Webex api, might be similar to waht we can do here.

get https://gist.github.com/zpeters/b33fcced8fc666bbf9b6a10d0dc38a59

First i try a get with HTTPoison.get. I'm blindly assuming on the first get that a timeout isn't in effect. After that call i send the response to maybe_retry, if try we recursively call back to get to try again, if false we had a successful get so we exit the function

maybe_retry https://gist.github.com/zpeters/967109e73d0d1f54057baae35a77f5db

At this point we have the response from the original call. There is a field in the header called "REtry-After". We look at this to see how long to wait. In my experience with this api, i needed to add a "fudge factor" to the reported retry after. i believe this was 5 seconds or so. If retry-after wasn't in the header we just return false and continue on our way

This probably not the best approach but it seemed to work for me. I'm going to fork your code and i will work on a pull if i can figure something out as an example.

Cheers

zpeters commented 5 years ago

Digging into the code i think the best fix for this might be to make the ex_pwned library handle the retries. I'm going to try to work out a solution and see if the author would be willing to accept a pull request. Otherwise i could publish a "fixed" version

zpeters commented 5 years ago

Working on a fix. I have a pull request for ex_pwned here ( https://github.com/techgaun/ex_pwned/pull/5 )

With these changes there is literally nothing you have to do additional in your code now. In fact, once this is accepted we can remove all of the "sleep" code

obahareth commented 5 years ago

@zpeters Thanks for the awesome work! Having the sleep in ex_pwned itself does sound like a much better idea. This probably won't work with concurrent requests, right? I was hoping I could at least create a process per G Suite directory I'm checking.

zpeters commented 5 years ago

I've gotten some advice to maybe just get an error from ex_pwned and have it return the timeout value. This would put us in a better place for concurrent requests .

zpeters commented 5 years ago

New pull request to ex_pwned, when this is accepted we can work on the retry logic. https://github.com/techgaun/ex_pwned/pull/7

zpeters commented 5 years ago

I'm going to drop a few PRs for you to look over. I have one for "retry" that you probably will just want to look over. In mix.exs i am referring to the github source for ex_pwned instead of hex until a new hex package gets published. Looking forward to working on more issues with you...

obahareth commented 5 years ago

I'll take a look at them tomorrow, thanks again @zpeters!