Open ppartarr opened 3 years ago
So, if I understand correctly, you are asking why does the \lambda_q^greedy take union over the guesses?
The ball(tpw)
denotes the set of all real passwords, for which tpw
is a valid typo. Now, if the attacker guesses tpw
, it will get an advantage equivalent to sum([p(rpw) for rpw in ball(tpw)])
. This is exactly what will happen for q=1
. Now extend this to q>1
, we need to take union of balls, which is done by typofixer.get_ball_union
, which you can find in this line.
Does this clarify your doubt?
Also, typofixer.check(password) | set(password)
is not correct, as password
is a string, and set(password)
will create a set with the characters from the password.
Thanks for your answer! It didn't quite clarify what I'm confused about, so allow me to rephrase :smile:
Why does lambda q greedy take the union ball instead of taking the union of the checked passwords and the password itself?
Say we have q = 1 and we are using the blacklist checker with the blacklist shown below. Let's use top 3 correctors swc-all, swc-first, rm-last. The attacker is an exact knowledge attacker and knows the password distribution. For the sake of this example let's say that after solving the greedy weighted max heap coverage problem, the attacker guesses "rockyou2".
We have typofixer.get_ball_union(["rockyou2"]) = ["Rockyou2", "ROCKYOU2", "rockyou", "rockyou2"]
but if the attacker submits rockyou2
the checked passwords will be typofixer.check(tpw) | set([tpw]) = ["Rockyou2", "ROCKYOU2", "rockyou2"]
Notice how rockyou
isn't checked under the blacklist checker because it is in the blacklist. Since it's not being checked, I'm confused about why it's probability is included in the calculation for the security loss lambda q greedy.
typofixer.check(tpw) | set([tpw])
is also how the weights are calculated https://github.com/rchatterjee/mistypography/blob/f0fb62cdc42bcd2f4e0881cdeaccfa640edd0b20/security/compute_sec_loss.ver1.py#L28
10 most frequent passwords in rockyou
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
Edit: I corrected the previous question to use set([password])
instead of the set(password)
Sorry for the late reply. Let me see if I understand your question this time. If not, I will be happy to jump in a short Zoom call sometime next week. It's been a long time since I have closely looked at the code.
I think you are right: The get_ball_union
function should use
self.check
instead of sefl.get_ball
. The check
function is cognizant
of Blacklist, etc., but the get_ball
is not.
Thanks a lot for pointing that out. I will really appreciate it if you can test and submit a pull request.
On Thu, Feb 18, 2021 at 2:36 AM Philippe Partarrieu < notifications@github.com> wrote:
Thanks for your answer! It didn't quite clarify what I'm confused about, so allow me to rephrase 😄
Why does lambda q greedy take the union ball instead of taking the union of the checked passwords and the password itself.
Say we have q = 1 and we are using the blacklist checker with the blacklist shown below. The attacker is an exact knowledge attacker and knows the password distribution. For the sake of this explanation let's say that after solving the greedy weighted max heap coverage problem, the attacker guesses "rockyou2".
We have union_ball("rockyou2") = ["Rockyou2", "ROCKYOU2", "rockyou", "rockyou2"] but if the attacker submits rockyou2 the checked passwords will be typofixer.check(tpw) | set([tpw]) = ["Rockyou2", "ROCKYOU2", "rockyou2"]
Notice how rockyou isn't checked under the blacklist checker because it is in the blacklist. Since it's not being checked, I'm confused about why it is included in the calculation for the security loss lambda q greedy.
typofixer.check(tpw) | set([tpw]) is also how the weights are calculated in the first place during the experiment https://github.com/rchatterjee/mistypography/blob/f0fb62cdc42bcd2f4e0881cdeaccfa640edd0b20/security/compute_sec_loss.ver1.py#L28
10 most frequent passwords in rockyou
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
Edit: I corrected the previous question to use set([password]) instead of the set(password)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rchatterjee/mistypography/issues/2#issuecomment-781170346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCEW7XUVKIFGJA2HWYJYDS7TGQDANCNFSM4XX3LLXA .
Hi @rchatterjee,
I really like your work on typo correction! I read your 2016 paper and I've been digging through the code to try and understand it better.
I am curious about how the security loss lambda q greedy is calculated for the various checkers. After solving the best-q-guesses problem in your experiment, you sum the probability of the union ball for every password in the best greedy guesses:
https://github.com/rchatterjee/mistypography/blob/f0fb62cdc42bcd2f4e0881cdeaccfa640edd0b20/security/compute_sec_loss.ver1.py#L207
https://github.com/rchatterjee/mistypography/blob/f0fb62cdc42bcd2f4e0881cdeaccfa640edd0b20/security/compute_secloss.py#L30
I understand that the union ball would be the checked ball for the always checker but this isn't the case for the blacklist & optimal checkers. It seems to me that lambda q greedy should be calculated using the checked ball with
typofixer.check(password) | set([password])
.Looking forward to hearing back from you!