Each solution has only one possible audio challenge, leading to possibility of a rainbow table type attack

batterystaples commented 6 years ago

Hi,

For any given input text, Flite produces the same output audio every time (when used with the same settings in the same environment with the same version, at least - and all of these things would be true for a single server). This is unlike the image captchas, in which a single input text can randomly produce one of many, many challenge images. Because of this, it is possible for an attacker to generate all possible audio challenges and store them for lookup later. This is especially easy since the challenges are only 4 letters long.

To solve this problem, there should be many different possible audio challenges for each solution, so that it becomes infeasible for an attacker to generate all possible challenges in a reasonable amount of time.

To demonstrate this issue, I have written a proof of concept targeting your test application, which you can run to confirm the issue. I include more details, and some potential ideas for solutions, in the README for that project.

Thanks,

appleorange1

mbi commented 6 years ago

Thanks for the great analysis and the POC, @appleorange1!

I've attached a pull request that, I hope, will mitigate this kind of attacks. Could you please review my proposed change? Would this prevent the kind of attack you describe in your POC?

batterystaples commented 6 years ago

Thanks! Your solution is good, in that it increases processing time, and obscures file length. To defeat it, I had to cycle through all 1024 different possible file lengths: https://github.com/appleorange1/django-simple-captcha-cracker-poc/commit/6d440264bc2e2f69bf00cc675868bbfb68b6bea0

However, I think that a better solution would be to add "random" background noise to the recording generated by flite. To do this, I have used a program called sox (which you can install with "apt install sox" if you're using a Debian-based system). For some reason, my audio plays in your testproject when I use "open audio", but not "play audio".

You will probably want to fix this before accepting the change (or you may wish to rewrite it altogether to avoid adding another dependency). Nonetheless, here is my solution: https://github.com/appleorange1/django-simple-captcha/commit/828acefdd39cb0102bc545a72c684dd3995fa82d

Adding random background noise should protect against the rainbow-table type of attack that I described. (It appears that there are methods of removing the background noise, but that is a different type of attack altogether.)

mbi commented 6 years ago

Awesome, thank you!

I slightly changed your code to only inject 5ms of noise in the beginning of the merged sample:

subprocess.call([settings.CAPTCHA_SOX_PATH, '-r', '8000', '-n', arbnoisepath, 'synth', '0.005', 'brownnoise'])

A single random bit should be enough to alter the resulting hash, so I'd rather not obfuscate the sample more than necessary. (Do you think voice to text on the generated sample is an attack vector? If yes, does the brown noise make this more difficult?)

I also fixed the merged output to be playable in the browser:

subprocess.call([settings.CAPTCHA_SOX_PATH, '-m', arbnoisepath, path, '-t', 'wavpcm', '-b', '16', mergedpath])

Anyway, I'll add some documentation on this feature and merge it to master, if you think my proposed changes are OK.

Cheers!

batterystaples commented 6 years ago

Good job working out how to get the browser to play the file!

Inserting only 5ms of noise into the beginning of the merged sample is indeed enough to prevent an attack based on hashes from working. But, this is open to another attack. Consider: -A person generates and stores a single challenge for each possible key. This will take a bit of space, but not that much that it is unreasonable. -They then calculate the level of similarity between the downloaded challenge and each of the stored challenges with the same file size using a tool like this. -In almost all cases, they will be able to guess the correct captcha.

To see what I mean by comparing similarity, you can download the above tool and then run:

$ flite -t "A, B, C, D" -o ABCD.wav
$ flite -t "Q, R, S, T" -o QRST.wav
$ sox -r 8000 -n arbnoise1.wav synth 0.005 brownnoise
$ sox -r 8000 -n arbnoise2.wav synth 0.005 brownnoise
$ sox -r 8000 -n arbnoise3.wav synth 0.005 brownnoise
$ sox -r 8000 -n arbnoise4.wav synth 0.005 brownnoise
$ sox -m arbnoise1.wav ABCD.wav -t wavpcm -b 16  ABCD1.wav
$ sox -m arbnoise2.wav ABCD.wav -t wavpcm -b 16  ABCD2.wav
$ sox -m arbnoise3.wav QRST.wav -t wavpcm -b 16  QRST1.wav
$ sox -m arbnoise4.wav QRST.wav -t wavpcm -b 16  QRST2.wav
$ ./lev_dist.py ABCD1.wav ABCD2.wav
$ ./lev_dist.py ABCD1.wav QRST1.wav

If you repeat the above tests with 4 seconds of brown noise instead of 0.005, you will see that the similarity comes out to be approximately the same for all files.

For this reason, I would recommend that the background noise go for the full duration of the captcha, even though it makes it a bit harder to hear.

However, it has just occurred to me that my pull request does not solve this problem entirely either - if the challenge is 4 letters, then 4 seconds is fine, but if someone customises the captcha to have more letters, 4 seconds may not be enough. The length of the brown noise should be set to be a bit longer than the typical length of a class of challenges.

As for voice to text on the generated sample - yes, it is an attack vector. I haven't had enough time to learn how this works, though. I do remember seeing another captcha library which runs OCR on the images that it generates before sending them to the user, and if the OCR is successful, it does not use that image as a challenge. If you know of any good voice to text software, it would be possible to try to break the challenges with that before sending them to the user.

mbi commented 6 years ago

Hmm, darn. What if we rendered a compressed format (e.g. mp3) instead of the uncompressed RIFF PCM we're using at the moment? Would that defeat the Levenshtein comparison?

EDIT: Here is the distance between two mp3 files with the same input text:

Matches: 2021 Differences: 1724

Two uncompressed WAV files with the same input yield:

Matches: 16656 Differences: 5331

Can't really tell how "close" those are? Is mp3 significant enough?

mbi commented 6 years ago

Ah, never mind: I found a way to limit the gain of the brown noise file, so that the merged output is perfectly understandable but still yields a significant difference:

Matches: 883 Differences: 31162

I think I'll go with this, thanks so much for your input!

mbi / django-simple-captcha

Each solution has only one possible audio challenge, leading to possibility of a rainbow table type attack #124