Open obahareth opened 6 years ago
The API reference says that requests are limited to one per every 1500 milliseconds. That could be the issue! https://haveibeenpwned.com/API/v2#RateLimiting
Thanks @sathwikmatsa, I've previously tried that as well as waiting for their Retry-After:
value and it didn't work for me. Even if I do get past that, the major issue still remains, I would be checking multiple G Suite directories at a time (e.g. admin of site1.com and site2.com checking multiple emails at the same time), I would need to make sure every single request that's been initiated by a different user waits for both the 1500 millisecond as well as the Retry-After
value of the last request across the application.
I'm interested in looking at this with you. I solved a similar problem with Cisco's api for Webex, though it was a "single user" not multi user like you are describing. Haven't dug into the code yet. Would I need to have a gsuite account to accurately test against? My code for webex is very basic but i can share a gist of that too if it helps for inspiration?
Hey @zpeters, thanks for wanting to help!
So the way the app works at the moment is that it just shows a button to log into G Suite, and you have to be an admin for it to fetch the directory. However if you use IEx, you could directly use the module HasMyGsuiteBeenPwned.HaveIBeenPwned
's check_user_and_sleep
function by passing a GSuite User struct to it, without having to deal with GSuite at all.
If it'd help out, maybe it's time I added unit tests to the code. Also, please do share the gist, it could be very helpful 😄.
@obahareth Cool, i will check this out. Here are the gists for the method i came up with for the Webex api, might be similar to waht we can do here.
get
https://gist.github.com/zpeters/b33fcced8fc666bbf9b6a10d0dc38a59
First i try a get with HTTPoison.get
. I'm blindly assuming on the first get
that a timeout isn't in effect. After that call i send the response to maybe_retry, if try we recursively call back to get to try again, if false we had a successful get so we exit the function
maybe_retry
https://gist.github.com/zpeters/967109e73d0d1f54057baae35a77f5db
At this point we have the response from the original call. There is a field in the header called "REtry-After". We look at this to see how long to wait. In my experience with this api, i needed to add a "fudge factor" to the reported retry after. i believe this was 5 seconds or so. If retry-after wasn't in the header we just return false and continue on our way
This probably not the best approach but it seemed to work for me. I'm going to fork your code and i will work on a pull if i can figure something out as an example.
Cheers
Digging into the code i think the best fix for this might be to make the ex_pwned library handle the retries. I'm going to try to work out a solution and see if the author would be willing to accept a pull request. Otherwise i could publish a "fixed" version
Working on a fix. I have a pull request for ex_pwned
here ( https://github.com/techgaun/ex_pwned/pull/5 )
With these changes there is literally nothing you have to do additional in your code now. In fact, once this is accepted we can remove all of the "sleep" code
@zpeters Thanks for the awesome work! Having the sleep in ex_pwned
itself does sound like a much better idea. This probably won't work with concurrent requests, right? I was hoping I could at least create a process per G Suite directory I'm checking.
I've gotten some advice to maybe just get an error from ex_pwned and have it return the timeout value. This would put us in a better place for concurrent requests .
New pull request to ex_pwned, when this is accepted we can work on the retry logic. https://github.com/techgaun/ex_pwned/pull/7
I'm going to drop a few PRs for you to look over. I have one for "retry" that you probably will just want to look over. In mix.exs i am referring to the github source for ex_pwned instead of hex until a new hex package gets published. Looking forward to working on more issues with you...
I'll take a look at them tomorrow, thanks again @zpeters!
https://haveibeenpwned.com's rate limits are a major blocker, waiting for them + an added 150ms still doesn't seem to be enough, and there's no simple way to work around them (we'd be asynchronously checking multiple entire G Suite directories, and we'd have to keep a "rate limit state" to make sure we are always using the most recent rate limit). A possible super ugly solution would be to only allow one directory to be checked at at time, therefore we don't have to worry about rate limit state.