[SECURITY ISSUE] this is not a captcha

evilsocket commented 2 years ago

Hi, i'm trying to estimate the reliability of this mechanism as an actual captcha (and not just as an overcomplicated rate limiter).

So I was wondering, what would prevent a bot from automatically fetch the PoW configuration (since sitekey is accessible), solve the challenge (maybe reusing the same Rust code for simplicity and efficiency), submit the PoW, get the token and either use it for every following request or repeat the process (for bruteforcing as an example)?

I get that the difficulty can be tuned, but still, I assume a bot implemented in Rust would be faster than its js/wasm counterpart, so it's safe to say a bot would take less than a browser anyway, thus kind of bypassing the rate limiting factor.

What am I missing?

Edit: adding PoC code https://github.com/evilsocket/mcaptcha_bypass

realaravinth commented 2 years ago

Hello,

There's nothing stopping anyone from writing a script/bot to solve the PoW challenge. Official bindings to the rust library exist for python and JavaScript, so we encourage scripting, actually.

Also, mCaptcha isn't capable of differentiating between a bot and a human. I used “captcha” to simplify conveying its use cases, but I realise it is inaccurate. I'm considering switching to a different name.

That said, native code is only slightly(~2s worst case scenario) faster than the WASM implementation(Sunday afternoon experiment ran in noisy environment; kindly take it with a grain of salt). This delay is marginal and isn't enough to completely make the rate-limiting ineffective.

A full-fledged experiment to test mCaptcha's abilities to guard against DDoS attack is WIP but here is a sneak peek:

Server protected with mCaptcha: The sharp bug jagged decline in requests/second(first plot in image above) is where the variable difficulty mechanism kicks in and difficulty factor increases.
Without mCaptcha: Decline in request/second is performance degradation of the server that was being tested. Expected behaviour in DDoS scenarios.

The benchmark code uses python bindings to the rust PoW library that mCaptcha uses, so it is technically native.

evilsocket commented 2 years ago

Yes the project name and its UI are very very misleading and most people will use it assuming it's an actual captcha with bot detection ... you know ... :D

Schermata 2022-08-05 alle 12 37 57

For the rate limiting scenario, why not just using the builtin web servers capabilities ( mod_ratelimit just as one example ) so that neither the frontend nor the backend have to handle it?

realaravinth commented 2 years ago

Yes the project name and its UI are very very misleading and most people will use it assuming it's an actual captcha with bot detection ... you know ... :D

True, but services like captcha farms are available truly invalidate the human factor. So IMHO, reCAPTCHA and hCaptcha too are rate-limiters in someways. But yes, rebranding. :D

Most rate-limiting tech that I'm aware of use IP logging to accomplish their goals(mod_ratelimt is IP-based too), which doesn't work very well when the user is behind CGNAT or Tor. mCaptcha being IP-independent and stateless is very accurate, even in those environments.

edit: forgot to link captcha farm

evilsocket commented 2 years ago

mCaptcha being IP-independent and stateless is very accurate, even in those environments.

I'm not sure about this neither ... an attacker could always spawn multiple parallel requests and perform what I wrote in the issue description by requesting one token per thread ... since as you said it's not IP based, the rate limit in this case only applies to the single request but it doesn't work if multiple requests are parallelized ... I can have 1000 parallel requests and they would not take 2s each, but way less than that ... I can work on a PoC if you like, i still think IP based rated limiting is more secure and effective than this.

evilsocket commented 2 years ago

this might actually DoS the mCaptcha backend logic ... do you mind if i (responsibly) experiment with demo.mcaptcha.org?

realaravinth commented 2 years ago

I'm not sure about this neither ... an attacker could always spawn multiple parallel requests and perform what I wrote in the issue description by requesting one token per threa

The images shared above were generated using locust, a distributed load testing/DDoS library, so the test environment is parallelised and spread across multiple machines. This particular test involved simulating 500 concurrent bots. The more elaborate testing would involve ~30 machines with even more bots.

This might actually DoS the mCaptcha backend logic ... do you mind if i (responsibly) experiment with demo.mcaptcha.org?

demo.mcaptcha.org is running out of my bedroom, so I'm afraid a DoS would affect the other services that are running on the same server too.

However, I could help you set up your own instance.

evilsocket commented 2 years ago

No worries I can RTFM :D https://github.com/mCaptcha/mCaptcha#self-hosted ... i'll update this issue with my results

realaravinth commented 2 years ago

We have a Matrix chatroom, you are kindly invited to join us there :)

evilsocket commented 2 years ago

Ok I've got a PoC working

Is there an HTTP route that I can use to test the token validity, either in the demo or in the docker-compose image?

realaravinth commented 2 years ago

Yes, kindly grab the account secret from http://your-mcptcha-instance/settings and use it to

curl --location --request POST 'https://your-mcaptcha-instance/api/v1/pow/siteverify' \
--header 'Content-Type: application/json' \
--data-raw '{
    "token": "tokenReturnedBymCaptchatoCaptchaWidget---replace this",
    "key": "sitekey---replacethis",
    "secret": "accont-secret"
}'

evilsocket commented 2 years ago

Rate limiting and difficulty increase don't seem to work neither, my local instance is configured with:

Schermata 2022-08-05 alle 14 57 59

So I expect the difficulty and PoW times to increase after the first concurrent request, however the average time and difficulty indicate that there's been no change at all and that rate limiting isn't working for concurrent requests:

Schermata 2022-08-05 alle 14 58 57

You can find the PoC code here https://github.com/evilsocket/mcaptcha_bypass ... set the website url and token here https://github.com/evilsocket/mcaptcha_bypass/blob/master/src/main.rs#L16 and run with:

cargo build --release && clear && ./target/release/mcaptcha_bypass

realaravinth commented 2 years ago

Printing the difficulty factor shows that difficulty factor scaling works and there are tests to verify this.

That said, the issue is with the difficulty factor. I've discovered that a higher difficulty factor doesn't always correspond with more time being spent. Difficulty factors that are of multiples 5000 x 10^x show consistent behaviour.

For instance:

Currently, the actual difficulty factor is computed by this line. I was going to modify it to always produce consistent difficulty factors after finding consistent factors like 5000 x 10^x.

evilsocket commented 2 years ago

If difficulty factor scaling works, how come I get an average of exactly 50000? Since I'm spawning 50 threads, I should expect some threads after the first N to get a higher factor and therefore the average should be higher than 50000 ... also the fact that all verifications completed successfully with their own token in under a second (total) means there's no rate limiting at all in place.

With all due respect, you might want to take a look at your tests ...

realaravinth commented 2 years ago

Did you use the configuration I shared above?

Here's the logs from running against the config I shared:

18:59 atm@lab mcaptcha_bypass ±|master ✗|→ crr
   Compiling mcaptcha_bypass v0.1.0 (/src/atm/code/mcaptcha/mcaptcha_bypass)
warning: field is never read: `token`
  --> src/main.rs:42:5
   |
42 |     token: Option<String>,
   |     ^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: `mcaptcha_bypass` (bin "mcaptcha_bypass") generated 1 warning
    Finished release [optimized] target(s) in 1.06s
     Running `target/release/mcaptcha_bypass`

██ ███    ███      █████      ██████   ██████  ████████
██ ████  ████     ██   ██     ██   ██ ██    ██    ██
██ ██ ████ ██     ███████     ██████  ██    ██    ██
██ ██  ██  ██     ██   ██     ██   ██ ██    ██    ██
██ ██      ██     ██   ██     ██████   ██████     ██

spawning 50 threads ...
500000
50000
50000
500000
50000
500000
500000000
500000
500000000
500000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000
500000000

I killed the program midway before it could compute average PoW because it was taking forever.

diff:

diff --git a/src/main.rs b/src/main.rs
index 82d2d82..de6a693 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -14,7 +14,7 @@ static BANNER: &str = "
 ██ ██  ██  ██     ██   ██     ██   ██ ██    ██    ██    
 ██ ██      ██     ██   ██     ██████   ██████     ██";
 static WEBSITE: &str = "http://localhost:7000";
-static SITEKEY: &str = "9qO2b37Zy3A3oLp4VxwDyYizoRCO63Yp";
+static SITEKEY: &str = "CNvAlejZDNUa5FR77Jp3FJLrOaNl0ogw";
 static THREADS: u32 = 50;

 static TOT_SUCCESS: AtomicU32 = AtomicU32::new(0);
@@ -74,6 +74,7 @@ fn main() {
                 .json::<Config>()
                 .unwrap();

+            println!("{}", config.difficulty_factor);
             TOT_DIFFICULTY.fetch_add(config.difficulty_factor, Ordering::SeqCst);

             // let duration = first_start.elapsed();

evilsocket commented 2 years ago

with your config and your println added it's the same (by the way it doesn't take forever to compute the average, it's just a simple division, it might be that the server hanged?):

Schermata 2022-08-05 alle 15 36 32

██ ███    ███      █████      ██████   ██████  ████████ 
██ ████  ████     ██   ██     ██   ██ ██    ██    ██    
██ ██ ████ ██     ███████     ██████  ██    ██    ██    
██ ██  ██  ██     ██   ██     ██   ██ ██    ██    ██    
██ ██      ██     ██   ██     ██████   ██████     ██

spawning 50 threads ...
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000
50000

50 threads done in 602.20595ms, verifications:50 errors:0 average_difficulty:50000 average_verification_ms:340

realaravinth commented 2 years ago

I think I found the issue. mCaptcha/cache is responsible for handing visitor count scaling. I patched it for this bug but didn't update the Redis docker image that the compose file uses.

Also, inconsistencies in the build & push script in the cache CI :facepalm:

Please try with https://demo.mcaptcha.org with Ja0siYkyklOuO4l90GBT9PoNTvvhxq30.

evilsocket commented 2 years ago

yep now it seems to work and actually do difficulty scaling, as my CPU shows :D

Schermata 2022-08-05 alle 16 07 12

realaravinth commented 2 years ago

*sighs in relief

Thanks for pointing it out, I don't think I would have discovered it in a million years :D

evilsocket commented 2 years ago

after I obtain one token i guess I could still use it to perform multiple requests to other pages? this fixes the scaling on the verification process but it doesn't offer rate limiting for the whole website ... right?

realaravinth commented 2 years ago

The token is single use, and is only used to protect the page which has the captcha. It doesn't do captive portals like Cloudflare does:

evilsocket commented 2 years ago

now that I think about it: since the difficulty scaling is not IP based, an attacker could start spawning a lot of requests (without the need to provide a valid PoW) and that would increase the difficulty factor for every other user, causing their browser to hang like PoC when I got to difficulty 500000000 ... ?

evilsocket commented 2 years ago

yes I can confirm the difficulty increases wether or not I send PoW

evilsocket commented 2 years ago

check https://demo.mcaptcha.org now, what difficulty do you get at the first request? i'm spawning threads and requesting PoW configurations without doing or submitting any PoW ... if i'm right you should get 500000000 right away

realaravinth commented 2 years ago

Difficulty factor 500000000 is inadvisable and is purely for illustration purposes. But yes, that is possible. mCaptcha/survey is WIP to benchmark mCaptcha on devices in the wild and provide guidelines to choose difficulty factors that work for majority of the devices.

But even then, the webmaster will be making tradeoffs between rate-limiting and allowing devices.

Ideally, even if the PoW configuration fetching endpoint is abused like the case you point out, if the delay isn't more than 10s, the visitor will be able to pass validation by simply waiting for 10s as opposed to being banned due to IP rate-limiting or having to solve a dozen reCAPTCHA/hCaptcha. This is the benefit(and the tradeoffs) that mCaptcha offers.

realaravinth commented 2 years ago

If the underlying webservice isn't protected by mCaptcha, then the behaviour under DoS will be same too: a delay in responses. But only this time, the underlying webservice will see performance degradation and possibly crash under load. And with mCaptcha, the load doesn't bleed to the underlying webservice.

evilsocket commented 2 years ago

Ok stopping the PoC now as I can verify from browser that I get 500000000 at the first request (and the browser hangs as expected for 500000000).

I get what you're saying about the tradeoffs of this approach, however what if: (I'm so sorry I'm thinking about all these corner cases, I'm used to attack software :D)

An attacker starts requesting PoW configurations until the difficulty gets to the max
The attacker then starts sending invalid PoW (without actually doing the computation, just random PoW data)

The CPU effort for the attacker would be minimal (as no PoW is actually computed despite of the increased difficulty), but is it safe to assume that would increase the load on the server a lot? Especially because, if no PoW is computed, the rate limit doesn't work (as it relies only on the PoW computation time, which is 0 in this case), so the attacker can send A LOT of max-difficulty-fake-PoW for the server to verify ... and DoS.

realaravinth commented 2 years ago

(I'm so sorry I'm thinking about all these corner cases, I'm used to attack software :D)

Don't be, you've been wonderful so far :)

The PoW config(the challenge string and the difficulty factor; the first HTTP request to mCaptcha) should exist in the cache. If it doesn't, then the hash won't be computed.

So in such scenarios, the difficulty factor will stay at the highest level causing delays to visitors but mCaptcha server and the underlying webserivce will remain under normal loads.

evilsocket commented 2 years ago

I'm talking about /api/v1/pow/verify

evilsocket commented 2 years ago

I can perform 50 requests to /api/v1/pow/config (just to increase the difficulty factor, without computing the PoW) and then start sending tons and tons of random PoW data to /api/v1/pow/verify ... for me that would take near to 0 CPU as I'm not actually computing the PoW, but what for the verification process in the server?

realaravinth commented 2 years ago

I see it now. There is one catch: the attacker will have valid PoW challenges, so they'll pass the initial validation(i.e., check if the challenge exists) and move on the PoW validation check, where the server will compute the hash.

* thinking

evilsocket commented 2 years ago

yep

realaravinth commented 2 years ago

I can run multiple instances of mCaptcha to distribute the load but still doesn't solve the problem: the attacker will be able to burn resources for free

evilsocket commented 2 years ago

yep

evilsocket commented 2 years ago

and assuming you load balance the instances, the attacker could keep doing that until all the instances are fully loaded

realaravinth commented 2 years ago

Right. I get that.

So here are the constraints:

it should be cheap to run mCaptcha
Privacy-friendly(that means, no IP[0])
It should be expensive for the attacker to mount an attack on both the underlying web service and mCaptcha

I'll have to think about this :sweat_smile:

evilsocket commented 2 years ago

I believe, unfortunately, the only solution to this problem is introducing some IP based rate limiting ... otherwise, it doesn't work as a captcha and the difficulty increase can be subverted to DoS the server ... not good XD

By the way i'm trying to find the implementation of verify_pow but I got stuck at this https://github.com/mCaptcha/mCaptcha/blob/3d9056e9689ac808bcb8de1a50c79aeb2809c599/src/data.rs#L86 ... can you point me to the actual implementation so we can get an idea of how CPU intensive that actually is for the server?

realaravinth commented 2 years ago

Verification is one hash computation. I have to re-write that bit, but it expands to a call to https://github.com/mCaptcha/libmcaptcha/blob/374dcc936ad5d030517be73a4d939cbe245fd9ac/src/system.rs#L109

500k verification calls can easily impose +0.5s delay, depending upon the server hardware and the attacker can achieve it for 10M HTTP calls, which isn't much.

evilsocket commented 2 years ago

one could increase those 0.5s by sending bigger target strings (since the attacker generates random strings of whatever size) to be hashed here https://github.com/mCaptcha/pow_sha256/blob/36c48597b24eb8ea3cdb47f9676f44e1f5c22627/src/lib.rs#L99 ... I don't see any limit on the size of the PoW string here https://github.com/mCaptcha/mCaptcha/blob/3d9056e9689ac808bcb8de1a50c79aeb2809c599/src/api/v1/pow/verify_pow.rs#L39 nor here https://github.com/mCaptcha/pow_sha256/blob/36c48597b24eb8ea3cdb47f9676f44e1f5c22627/src/lib.rs#L86-L117 ... setting a reasonable max size there would at least avoid this issue (check #[validate(max_length = N)] here https://docs.rs/serde_valid/latest/serde_valid/ ) ... it's something 🤷🏻

realaravinth commented 2 years ago

setting a reasonable max size there would at least avoid this issue

Acknowledged.

IP rate-limiting and responding with HTTP 429 isn't good enough but so is an insecure endpoint. So I have the following idea, that I think will work:

IP address based scheduling for PoW validation

Scheduling PoW validations based on IP address will run PoW validations from IP addresses in rotation. Multiple validation requests from the same IP address will be queued and executed when the same IP address is next scheduled.

This way, the IP addresses sending too many validation requests will only be executed freely, i.e, without penalties through delay, only when there are no requests from other IP addresses --- highly unlikely for even small deployments.

Blanket bans with HTTP 429 give no chance for execution, whereas a queued execution model will eventually execute validation. Which I think will offer better usability in Tor and VPNs than IP rate-limiting.

But using this mechanism will effectively rule out IP addresses on Tor. If a critical web service is protected by mCaptcha and if some oppressive regime wants to disallow access to that website, they can do by sending requests through Tor, out of all exit nodes.

This solution is less than ideal than having a magical non-DoS-able verification endpoint. I will continue to investigate ways to find better mechanisms that will work with the constraints stated above but for now, this should work.

I'll be implementing it next week, if no further loopholes are found.

evilsocket commented 2 years ago

it sounds doable, however it depends on how the work queue is implemented ... if each queue is by-ip, i'd add a limit to the number of queued requests per IP, otherwise one could just keep filling the queue and either consume server memory or block other queues (other clients)

realaravinth commented 2 years ago

if each queue is by-ip, i'd add a limit to the number of queued requests per IP

That's the idea. The max limit will be configurable.

Queues already exist within mCaptcha, they use the Actor model which scales very well and use a leaky-bucket algorithm which will ensure that the queue is constantly trimmed.

realaravinth commented 2 years ago

libmcaptcha v0.2.2 implements queued validation with max queue length configuration. Closing.

Thanks for the help! Next time kindly use private channels to disclose security issues. Instructions are available here.

mCaptcha / mCaptcha

[SECURITY ISSUE] this is not a captcha #37

IP address based scheduling for PoW validation