Open atreya2011 opened 1 year ago
Thank you for this design proposal! It is well written! I do have some points here:
If the number of login attempts exceeds a specified threshold within the time period, the user will be locked out of the account.
The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.
If the number of failed attempts exceeds a pre-defined threshold in the config, lock the user's account by updating the active field in the identities table to false.
Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.
If the credentials are valid, insert a new record into the login_history table with the identity_id, login_time, and login_status set to 'success'.
My recommendation would be to have a table with the following layout:
CREATE TABLE selfservice_rate_limits (
id UUID PRIMARY KEY
nid UUID
identity_id UUID NOT NULL REFERENCES ...
attempts int NOT NULL
last_attempt TIMESTAMP NOT NULL
)
And then you would do UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ...
depending on how long ago last_attempt
was. This will make reading the table much less expensive because we do not need to make a range query SELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15
but instead query a single row SELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1
.
At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!
@aeneasr
Thank you for your feedback and suggestions!
The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.
What about a config like this?
rate_limit:
max_duration: 30s
backoff_factor: 2
max_wait_time: 10s
max_login_attempts: 5
login_attempt_window: 1m
If we introduce a max_wait_time
parameter, the logic for rate limiting would be something like the below, based on the sample configuration above. Let me know if I am going in the right direction ππΌ
max_login_attempts
within a span of 50 seconds, the next failed login attempt within the 1 minute login_attempt_window
will result in the user's account being locked out.Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.
We could use a table with a design as follows:
CREATE TABLE lockouts (
id UUID PRIMARY KEY
identity_id UUID NOT NULL REFERENCES ...
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL
);
it may be beneficial to consider creating a separate table if the relationship between the identities
table and the lockouts
table is intended to be a one-to-many relationship. An alternative solution could be to incorporate the following columns into the selfservice_rate_limits
table, rather than creating a separate one.
locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,
As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as identities/{identity_id}/unlock
to unlock the account.
My recommendation would be to have a table with the following layout:
CREATE TABLE selfservice_rate_limits ( id UUID PRIMARY KEY nid UUID identity_id UUID NOT NULL REFERENCES ... attempts int NOT NULL last_attempt TIMESTAMP NOT NULL )
And then you would do
UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ...
depending on how long agolast_attempt
was. This will make reading the table much less expensive because we do not need to make a range querySELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15
but instead query a single rowSELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1
.At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!
Thank you for providing the table design and sample query ππΌββοΈ I was also considering the same concept.
Could you please provide some clarification regarding the following ππΌ
nid
in the selfservice_rate_limits
table?identities
table and the selfservice_rate_limits
table?
depending on how long ago
last_attemptwas.
max_wait_time
in the config to check how long ago the last_attempt
was?@aeneasr
Thank you for your feedback and suggestions!
The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.
What about a config like this?
rate_limit: max_duration: 30s backoff_factor: 2 max_wait_time: 10s max_login_attempts: 5 login_attempt_window: 1m
I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?
If we introduce a
max_wait_time
parameter, the logic for rate limiting would be something like the below, based on the sample configuration above. Let me know if I am going in the right direction ππΌ
- The user will have to wait for 10 seconds before trying to login after each failed login attempt.
- Assuming the user fails 5 login attempts set by
max_login_attempts
within a span of 50 seconds, the next failed login attempt within the 1 minutelogin_attempt_window
will result in the user's account being locked out.
I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:
- The user is free to adjust the above parameters depending on the level of security that is desirable for them.
Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.
We could use a table with a design as follows:
Option 1
CREATE TABLE lockouts ( id UUID PRIMARY KEY identity_id UUID NOT NULL REFERENCES ... lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL );
That sounds good! Keep in mind that not all databases support ENUMs.
it may be beneficial to consider creating a separate table if the relationship between the
identities
table and thelockouts
table is intended to be a one-to-many relationship. An alternative solution could be to incorporate the following columns into theselfservice_rate_limits
table, rather than creating a separate one.
I think it makes sense to keep the business logic around this in one table. This will help with efficiency!
Option 2
locked BOOLEAN lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL,
Could you elaborate on this a bit more, I'm not sure in what context to put this?
As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as
identities/{identity_id}/unlock
to unlock the account.
That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?
My recommendation would be to have a table with the following layout:
CREATE TABLE selfservice_rate_limits ( id UUID PRIMARY KEY nid UUID identity_id UUID NOT NULL REFERENCES ... attempts int NOT NULL last_attempt TIMESTAMP NOT NULL )
And then you would do
UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ...
depending on how long agolast_attempt
was. This will make reading the table much less expensive because we do not need to make a range querySELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15
but instead query a single rowSELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1
. At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!Thank you for providing the table design and sample query ππΌββοΈ I was also considering the same concept.
Could you please provide some clarification regarding the following ππΌ
- The role of
nid
in theselfservice_rate_limits
table?
This is an internal value which will help (in the future) migrate between ory self-hosted and the ory network.
- Is it a one-to-one relationship between the
identities
table and theselfservice_rate_limits
table?
I think that makes sense, I don't think we need a 1:n relation for it - 1:n relations also are range queries which are significantly slower than unique (secondary) key reads.
depending on how long ago
last_attemptwas.
- Can we use the
max_wait_time
in the config to check how long ago thelast_attempt
was?
I'd probably use fixed values for now, to not complicate the kratos config further
@atreya2011 from Slack 20230128 "What are Ory's ways of challenging suspicious logins for example: a. logins from new locations b. too many failed attempts c. etc. note from tacurran - this could include blacklisted devices, new credentials, weak credentials
These features will require some advanced data analysis, perhaps ML, and should be included in this issue.
@aeneasr
What about a config like this?
rate_limit: max_duration: 30s backoff_factor: 2 max_wait_time: 10s max_login_attempts: 5 login_attempt_window: 1m
I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?
https://developers.cloudflare.com/waf/rate-limiting-rules/best-practices/ https://developer.okta.com/docs/reference/rl-best-practices/
I found 2 articles that discuss best practices for rate limiting in Cloudflare WAF and Okta. They emphasize the importance of customizing rate-limiting settings to match the specific needs of a website or API and suggest using a combination of IP and user identity to establish unique rate limits. Both articles advise on how to monitor the effectiveness of rate limiting and make adjustments as needed to prevent DDoS attacks and other malicious traffic. Unfortunately, they don't give specific advice on default values. I will keep searching...
I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:
- failed attempt user has to wait 100ms
- failed attempt user has to wait 1s
- failed attempt user has to wait 2s
- failed attempt user has to wait 4s
- failed attempt user has to wait 8s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
I understand the logic you've presented. Additionally, I have come across two articles that suggest setting a lockout threshold of 10 attempts or more as best practice for account lockout in Active Directory. Based on the logic discussed, it would be advisable to set the threshold at or above 10 to prevent the risk of a malicious user causing a lockout. We could keep the default value of 0 to signify no lockout.
https://www.netwrix.com/account_lockout_best_practices.html
CREATE TABLE lockouts ( id UUID PRIMARY KEY identity_id UUID NOT NULL REFERENCES ... lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL );
That sounds good! Keep in mind that not all databases support ENUMs.
Thank you! If I understand correctly, Kratos utilizes go-buffalo/pop as the underlying package for handling database-related operations. Therefore, it would be appropriate to design the table as a Go struct, correct?
I think it makes sense to keep the business logic around this in one table. This will help with efficiency!
Option 2
locked BOOLEAN lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL,
Could you elaborate on this a bit more, I'm not sure in what context to put this?
I apologize for any confusion. I had suggested adding the lockout-related parameters to the selfservice_rate_limits
table, so that all the information would be in one place. I would be grateful to learn your thoughts regarding the following database design.
CREATE TABLE selfservice_rate_limits (
id UUID PRIMARY KEY
nid UUID
identity_id UUID NOT NULL REFERENCES ...
attempts int NOT NULL
last_attempt TIMESTAMP NOT NULL
locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason varchar NOT NULL,
)
As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as
identities/{identity_id}/unlock
to unlock the account.That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?
I don't have a particular preference for this. But having an unlock
endpoint for an admin to unlock a user account does seem clear and straightforward.
Alternatively, we could introduce an endpoint, such as /identities/{identity_id}/rate_limit
, and send a PATCH
request with the necessary parameters to unlock a user account.
- The role of
nid
in theselfservice_rate_limits
table?This is an internal value which will help (in the future) migrate between ory self-hosted and the ory network.
Thank you for the clarification!
- Can we use the
max_wait_time
in the config to check how long ago thelast_attempt
was?I'd probably use fixed values for now, to not complicate the kratos config further
I understand, that seems perfectly reasonable.
@atreya2011 from Slack 20230128 "What are Ory's ways of challenging suspicious logins for example: a. logins from new locations b. too many failed attempts c. etc. note from tacurran - this could include blacklisted devices, new credentials, weak credentials
These features will require some advanced data analysis, perhaps ML, and should be included in this issue.
@tacurran
Thank you for bringing these points to my attention ππΌ
Regarding point A, we could enhance security by combining Two-factor authentication (2FA), which is already available in Kratos, with monitoring login activity from new locations and setting up alerts / requiring a one-time code input for any unusual behaviour. However, I believe that addressing this point deserves a separate discussion (issue), as it does not directly relate to rate-limiting and account lockout discussed in this design proposal ππΌββοΈ
Regarding point B, I have submitted a design proposal in this issue to implement account lockout after a specified number of failed login attempts, as well as rate-limiting consecutive login attempts.
Would you be able to provide additional details regarding point C? Is it related to creating an endpoint to counteract point A, in order for an admin to blacklist devices/IPs and prevent malicious activities?
@aeneasr
What about a config like this?
rate_limit: max_duration: 30s backoff_factor: 2 max_wait_time: 10s max_login_attempts: 5 login_attempt_window: 1m
I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?
https://developers.cloudflare.com/waf/rate-limiting-rules/best-practices/ https://developer.okta.com/docs/reference/rl-best-practices/
I found 2 articles that discuss best practices for rate limiting in Cloudflare WAF and Okta. They emphasize the importance of customizing rate-limiting settings to match the specific needs of a website or API and suggest using a combination of IP and user identity to establish unique rate limits. Both articles advise on how to monitor the effectiveness of rate limiting and make adjustments as needed to prevent DDoS attacks and other malicious traffic. Unfortunately, they don't give specific advice on default values. I will keep searching...
I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:
- failed attempt user has to wait 100ms
- failed attempt user has to wait 1s
- failed attempt user has to wait 2s
- failed attempt user has to wait 4s
- failed attempt user has to wait 8s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
- failed attempt user has to wait 16s
I understand the logic you've presented. Additionally, I have come across two articles that suggest setting a lockout threshold of 10 attempts or more as best practice for account lockout in Active Directory. Based on the logic discussed, it would be advisable to set the threshold at or above 10 to prevent the risk of a malicious user causing a lockout. We could keep the default value of 0 to signify no lockout.
That sounds good to me!
https://www.netwrix.com/account_lockout_best_practices.html
CREATE TABLE lockouts ( id UUID PRIMARY KEY identity_id UUID NOT NULL REFERENCES ... lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL );
That sounds good! Keep in mind that not all databases support ENUMs.
Thank you! If I understand correctly, Kratos utilizes go-buffalo/pop as the underlying package for handling database-related operations. Therefore, it would be appropriate to design the table as a Go struct, correct?
That's correct! Please keep in mind that enum is not supported in all databases, so you'll probably need to implement the enum in Go code :)
I think it makes sense to keep the business logic around this in one table. This will help with efficiency!
Option 2
locked BOOLEAN lock_time TIMESTAMP NOT NULL, reason ENUM('rate_limiting', 'admin_action') NOT NULL,
Could you elaborate on this a bit more, I'm not sure in what context to put this?
I apologize for any confusion. I had suggested adding the lockout-related parameters to the
selfservice_rate_limits
table, so that all the information would be in one place. I would be grateful to learn your thoughts regarding the following database design.CREATE TABLE selfservice_rate_limits ( id UUID PRIMARY KEY nid UUID identity_id UUID NOT NULL REFERENCES ... attempts int NOT NULL last_attempt TIMESTAMP NOT NULL locked BOOLEAN lock_time TIMESTAMP NOT NULL, reason varchar NOT NULL, )
That looks very good to me! Please don't forget to add an index over nid, identity_id
to make look up fast :) I'm not sure if we need lock_time
(as we have last_attempt which should be equal to lock_time
) but generally this looks great!
As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as
identities/{identity_id}/unlock
to unlock the account.That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?
I don't have a particular preference for this. But having an
unlock
endpoint for an admin to unlock a user account does seem clear and straightforward. Alternatively, we could introduce an endpoint, such as/identities/{identity_id}/rate_limit
, and send aPATCH
request with the necessary parameters to unlock a user account.
Good question! I think we have several options. From a pure REST perspective, we try to modify resources in plural:
DELETE /identities/<id>/locks
PATCH /identities/<id>/rate-limits
PATCH /identities/<id>/throttles
Alternatively we could use verbs like but that would not be fully "REST"y:
PATCH /identities/<id>/unlock
PATCH /identities/<id>/unblock
I think that "block" and "unblock" are misleading, because "blocking" a user typically involves administrative intervention (user is malicious -> block them) whereas here we throttle/limit user authentication. What do you think of the following:
PATCH /identities/<id>/rate-limits?flow=login
and a table of
CREATE TABLE selfservice_login_rate_limits () ...
or alternatively a table with the design above and an additional field to deal with rate limits in the future for recovery or verification flows:
CREATE TABLE selfservice_rate_limits (
...
flow_type ENUM('login', 'recovery', ...)
...
)
Super excited about this being implemented.
Let me know If you need any help implementing it. @jossbnd and I can help if needed β
@aeneasr
Thank you for all the feedback! Let me summarize our discussion and, if everything looks good, I'll proceed with the implementation π
Good question! I think we have several options. From a pure REST perspective, we try to modify resources in plural:
DELETE /identities/<id>/locks
PATCH /identities/<id>/rate-limits
PATCH /identities/<id>/throttles
Alternatively we could use verbs like but that would not be fully "REST"y:
PATCH /identities/<id>/unlock
PATCH /identities/<id>/unblock
I think that "block" and "unblock" are misleading, because "blocking" a user typically involves administrative intervention (user is malicious -> block them) whereas here we throttle/limit user authentication. What do you think of the following:
PATCH /identities/<id>/rate-limits?flow=login
and a table of
CREATE TABLE selfservice_login_rate_limits () ...
or alternatively a table with the design above and an additional field to deal with rate limits in the future for recovery or verification flows:
CREATE TABLE selfservice_rate_limits ( ... flow_type ENUM('login', 'recovery', ...) ... )
I really like the idea of PATCH /identities/<id>/rate-limits?flow=login
since, as you pointed out, we can use it as a base for rate-limiting the other flows as well. I have updated the table design based on your feedback, including the removal of the lock_time
field and the addition of the flow_type
field. Understood regarding implementation of enums in Go code :)
CREATE TABLE selfservice_rate_limits (
id UUID PRIMARY KEY,
nid UUID,
identity_id UUID NOT NULL REFERENCES identities,
attempts int NOT NULL,
last_attempt TIMESTAMP NOT NULL,
locked BOOLEAN,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,
flow_type ENUM('login', 'recovery', ...)
);
CREATE INDEX idx_identity_id ON selfservice_rate_limits (identity_id);
CREATE INDEX idx_nid ON selfservice_rate_limits (nid);
I have also simplified the configuration related to rate limiting, based on your feedback. Here is the updated version:
rate_limit:
max_duration: 30s (default)
backoff_factor: 2 (default)
max_attempts_before_lockout: 0 (default)
Super excited about this being implemented.
Let me know If you need any help implementing it. @jossbnd and I can help if needed hand
@supercairos Thank you for reaching out! I am planning to start working on it once I get the final approval from @aeneasr. I would greatly appreciate it if you could provide feedback on my pull request once I submit it. Thank you ππΌ
Awesome, that's a great summary! A few more points (but we're close!)
I'm not quite sure if I understand what max_attempts_before_lockout
governs, could you briefly explain why we need this configuration setting?
I thought a bit more about the table. What do you think about this:
CREATE TABLE selfservice_rate_limits (
id UUID PRIMARY KEY,
nid UUID,
identity_id UUID NOT NULL REFERENCES identities,
attempts int NOT NULL,
next_possible_attempt TIMESTAMP NOT NULL
);
I don't think we need the enums right now, reason
currently has only one value (too many login attempts) and flow_type
is not yet relevant.
In business logic:
next_possible_attempt
is in the past, then login is permittedattempts
is increased by one, and next_possible_attempt
is updated to a future timestamp (e.g. in 30 seconds)next_possible_attempt
is set sufficiently to the past (to prevent clock skew issues) and attempts
is set to 0
. Alternativley the row could be deleted but that could be problematic with transaction contention and/or multi-region databases.My recommendation would be the following configuration layout:
selfservice:
flows:
login:
rate_limit:
max_duration: 30s
backoff_factor: 2
# potentially max_attempts_before_lockout
This would make the semantics very clear - this rate limit config is for login :)
@aeneasr
Thank you for the feedback!
While last_attempt
is a clear and straightforward option, using next_possible_attempt
to determine if a user is allowed to log in works well too. I am flexible and open to either approach.
The purpose of max_attempts_before_lockout
is to determine the maximum number of attempts allowed before an account lockout occurs, where login would be disallowed until manually reset by an admin.
We could prevent a malicious user from making too many failed login attempts this way.
The following articles suggest that a value of 10 or higher would be appropriate for max_attempts_before_lockout
. We could set the default value to 0, signifying no lockout, and offer the flexibility for the user to adjust it as they see fit in the config :)
https://www.manageengine.com/products/active-directory-audit/kb/best-practices/account-lockout-best-practices.html#:~:text=The%20recommended%20threshold%20is%2015%20to%2050 https://www.netwrix.com/account_lockout_best_practices.html
Please let me clarify how max_attempts_before_lockout
would work, by making a slight modification to your proposed business logic and taking into consideration your proposed method for calculating the backoff duration.
I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:
failed attempt user has to wait 100ms failed attempt user has to wait 1s failed attempt user has to wait 2s failed attempt user has to wait 4s failed attempt user has to wait 8s failed attempt user has to wait 16s failed attempt user has to wait 30s failed attempt user has to wait 30s failed attempt user has to wait 30s
next_possible_attempt
is in the past, then login is permitted.attempts
is increased by one, and next_possible_attempt
is updated using an exponential backoff algorithm, with a cap at max_duration
.next_possible_attempt
is set to a time sufficiently in the past, and in the case of Go, would be set to time.Time{}
(to prevent clock skew issues) and attempts
is set to 0.max_attempts_before_lockout
, the user will be locked out and unable to login again until the admin resets the max_attempts to 0.PATCH /identities/<id>/rate-limits
and a payload of {"attempts": 0}
to unlock the user.Thank you very much for the explanation around max_attempts_before_lockout
. I think that there are several things missing to make max_attempts_before_lockout
work really well:
Since that involves a lot of work, my suggestion is to keep this feature out of scope for now and focus on the pure implementation of this feature. If an administrator wants to lock out a user, they can also set the time quite high (an hour or more) which effectively achieves the same thing. WDYT?
@aeneasr
I understood points 1 and 3.
Automated reporting for the administrator telling them that a user was locked out
We can use the courier in kratos to automatically send an email to the administrator regarding the lockout.
Convenient way to fetch all locked accounts easily
If we add an additional locked BOOL
column in the table, it will become easy to query and fetch all locked accounts.
A way for the user to contest the lock out (why should I not be able to sign in when someone else nuked my account?)
Can you confirm if my understanding regarding point 2 is correct? The idea behind point 2 is to provide a mechanism for the user to contest the lock-out if they believe it was unjustified. This could involve:
If an administrator wants to lock out a user, they can also set the time quite high (an hour or more) which effectively achieves the same thing. WDYT?
Yea this is also possible. While manual tracking by an administrator is possible, implementing a max_attempts_before_lockout
feature would ensure automatic lockout, making it more convenient for the administrator as they wouldn't need to keep track of multiple failed login attempts from a user.
Since that involves a lot of work, my suggestion is to keep this feature out of scope for now and focus on the pure implementation of this feature.
Thank you for your input! Once I have completed the implementation of the rate-limiting feature, I will take your suggestions into consideration and prepare a revised proposal for implementing the account lockout feature, including all three points you mentioned (if that is okay with you π). Shall I go ahead with the implementation of the rate-limiting feature?
Epic, I think we're fully aligned. Thank you for your patience in this process - it definitely helps align on the vision for the feature and will make implementation much easier! Let's go! :)
@aeneasr
Awesome! Likewise, I thoroughly enjoyed the brainstorming process and am grateful for your feedback and approval ππΌ I will proceed with implementing the discussed design ππΌ
@aeneasr
Awesome! Likewise, I thoroughly enjoyed the brainstorming process and am grateful for your feedback and approval ππΌ I will proceed with implementing the discussed design ππΌ
Hi, I'm kinda interested in the feature, do you guys have any ETA for shipping it?
@DParty I am in the same boat as well. My apologies for not being able to give you an exact estimate at this time. I understand that this may be disappointing, but please know that I am actively working on it. Thank you for your understanding ππΌ
Awesome you are working on this @atreya2011 - I think it'll be a very valuable feature, and the org I work for also needs it. I'm not that experienced with go but if I can help with implementing or testing anything, let me know.
@credcore-dave Thank you! If all goes well I should be able to get a PR up by end of this month :)
Hello all. I was reading the conversation going on here and I think what you suggested and refined for throttling repeated requests is a very good approach if a malicious actor is trying to get access to a specific account. Do you also have an idea what to do about credential stuffing where an attacker is using (e.g. pastebin) lists of usernames/emails and passwords to try and check if he can find accounts to take over easily? Since he will only use every username/email once there is another approach needed for this attack vector.
Hello all. I was reading the conversation going on here and I think what you suggested and refined for throttling repeated requests is a very good approach if a malicious actor is trying to get access to a specific account. Do you also have an idea what to do about credential stuffing where an attacker is using (e.g. pastebin) lists of usernames/emails and passwords to try and check if he can find accounts to take over easily? Since he will only use every username/email once there is another approach needed for this attack vector.
@Robert-Bosse I think this issue might be what you're looking for?
@Robert-Bosse
Ory Kratos uses the haveibeenpwned API to check against a db of leaked passwords, you can also host your own version of it. I guess this mostly helps against current/past leaks, not so much against future or secret leaks.
You can also enforce Multifactor AuthN with TOTP or WebAuthn.
CAPTCHA or IP blocking only helps a bit against this attack, by far the most effective method is MFA.
Ory Network also implements capabilities to fingerprint sessions to devices, that is also pretty effective.
/I think the CSRF protection also helps, at least OWASP mentions it.
Happy to continue this in a dedicated discussion, to not derail this issue further
What is the status of the implementation for this? I can't find any PR or work in progress. Would love this feature and would happily contribute if assistance is needed.
@Oscmage Work is progressing on this slowly. My apologies for the wait πββοΈ
@atreya2011 No worries at all! Sorry if it came out like that. Happy to wait/assist :) Was just curious since this was one of the features that we were comparing our current Firebase/Google Identity Platform solution with.
Similar situation here for us. Looking forward to this functionality. We've been trying to solve this problem for a couple of months but there hasn't been an elegant solution externally from Kratos. Is there anything that I could do to provide assistance, we've got a few developers that would be happy to contribute :)
Similar situation here for us. Looking forward to this functionality. We've been trying to solve this problem for a couple of months but there hasn't been an elegant solution externally from Kratos. Is there anything that I could do to provide assistance, we've got a few developers that would be happy to contribute :)
@SeanTasker Personal commitments were slowing down my progress and I am hoping to make a PR this month sometime ππΌ Do you have a fork where your team has been trying to solve this?
No problem at all! And apologies for my slow response. Our internal issue tracker was down and I wanted to check a couple of things as we originally discussed a number of options for our login page. In the end we settled on using this template https://github.com/timalanfarrow/kratos-selfservice-ui-vue3-typescript to build our own login and account creation pages. So originally we tried to solve this with a captcha, however it turned out to be insufficient as we needed to store some state server side.
We haven't forked Kratos (yet) - that's why I am here, to determine if forking will get us closer to a solution sooner. We were thinking about adding some additional hooks that would let us implement additional login restriction checks.
Let me know if we can help. We're not super familiar with the Kratos code base right now so would just need some direction to help us know where to focus our attention.
@atreya2011 any update on this functionality? Thanks!
I think should be prioritized. We had problems with this in the past days. This is so necessary. We are thinking to change Kratos for this reason. We are very scared.
@bradleyball @frederikhors Progress is plodding on this due to many personal and professional commitments. I am looking at probably the end of this year or hopefully by January 2024 to get a PR up. However, it seems like a new approach is being discussed here? https://github.com/ory/kratos/issues/3580
@atreya2011 I think this should be done from Ory Kratos staff. Is not a nice-to-have feature. This is a security MUST.
@frederikhors @aeneasr will carefully review the code before merging it to master so you donβt have to worry about who works on this issue π
No, I didn't explain myself correctly. I mean it's urgent to do. Because without it there is a BIG RISK trying a login many times until you find the correct password.
@frederikhors Yes. I second that. It is something that is urgently needed.
This is a top priority for the Ory team and we're prioritizing protections. I'll update this issue with any upcoming changes.
@kmherrmann any ETA? Even if not exact?
@kmherrmann any ETA? Even if not exact?
It would be great to hear back on this.
@kmherrmann any news for our top priority project here?
Hello, we have decided not to work on this as part of Ory Kratos. Rate limiting, credentials stuffing, IP rate limiting across multiple nodes, and DoS prevention are very difficult problems to solve (typically cat and mouse type problems) and it makes much more sense to solve them on an operational level with things like Gateway Ratelimiters, JA3 Fingerprinting, Anti Bot detection, API firewalls or services like Cloudflare or Akamai.
We have solved these things in Ory Network using a variety of tools, but that also implies that we are not implementing this in Ory Kratos itself.
Sorry for anyone who was waiting for this here! Thank you for your understanding.
@aeneasr
Thank you for the clarification! Does this mean any PR related to this the feature wonβt be accepted? Or does it mean that ORY wonβt officially work on this feature? If ORY wonβt officially work on this feature but is willing to accept a PR based on the design proposal above, I can pick up my implementation from where I left it off.
Hello, we have decided not to work on this as part of Ory Kratos. Rate limiting, credentials stuffing, IP rate limiting across multiple nodes, and DoS prevention are very difficult problems to solve (typically cat and mouse type problems) and it makes much more sense to solve them on an operational level with things like Gateway Ratelimiters, JA3 Fingerprinting, Anti Bot detection, API firewalls or services like Cloudflare or Akamai.
We have solved these things in Ory Network using a variety of tools, but that also implies that we are not implementing this in Ory Kratos itself.
Sorry for anyone who was waiting for this here! Thank you for your understanding.
Wow. This is huge.
I think we should at least add guides on docs for these serious problems.
If not, using Kratos or others is the same at this point.
We would certainly accept PRs if the code changes are not too huge and do not require significant maintenance on our end. They should also be scoped to tackle one problem only (eg account locking on repeatedly failed attempts). Anything that is IP related should be out of scope (such as credentials stuffing) in my view as that should be dealt with on a rate limiter service.
@aeneasr
Thank you for the swift response.
Thatβs good to hear. However, the original design proposal was for the rate limiting feature based on identity ID (not IP address) and then implement account lock in another PR.
Is that still okay?
@aeneasr How do we solve credential stuffing by a bot net where no IP is used more than a handful of times? We already had this kind of attack more than once.
@aeneasr How do we solve credential stuffing by a bot net where no IP is used more than a handful of times? We already had this kind of attack more than once.
What are you using right now? i mean, Cloudflare? What else?
no cloudflare. custom login solution. right now we check ratio of good to bad logins for activating a captcha if the ratio gets too low.
@aeneasr https://thehackernews.com/2024/01/microsofts-top-execs-emails-breached-in.html If you have something to check good to bad login ratio you have a chance to see this attack ;)
Preflight checklist
Context and scope
Goals and non-goals
The design
Based on the discussion here: https://github.com/ory/kratos/issues/654
Example rate-limit config
Example lockout config
APIs
No response
Data storage
Create a new table to store the login history of a specific identity. The schema is as follows:
One identity shall have many login history records One login history record belongs to one identity
Code and pseudo-code
When an identity attempts to log in:
How to track consecutive login attempts:
Degree of constraint
No response
Alternatives considered
No response