matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.73k stars 2.63k forks source link

Find an open source alternative to Google Recaptcha for our website #13905

Open mattab opened 5 years ago

mattab commented 5 years ago

Currently we're using Google Recaptcha on pages with a form, which leaks lots of data to Google.

For example on this page: https://matomo.org/contact/

-> It would be fantastic to find & use an open source, decentralised alternative to Google recaptcha on our Matomo.org website.

If anyone knows an alternative to Recaptcha that works, please let us know

fdellwing commented 5 years ago

There are a lot of Captcha-Libaries, but none of them provide such features as reCaptcha.

Findus23 commented 5 years ago

@fdellwing The only feature we need is not getting overwhelmed with spam :slightly_smiling_face:

Bonus points if it is accessibility-friendly.

fdellwing commented 5 years ago

As I said, I know no captcha that is nearly as user friendly as reCaptcha. So best would be to take some random image captcha (where are MANY) and just hit an self made database on top that recognises returning users.

Findus23 commented 5 years ago

As I said, I know no captcha that is nearly as user friendly as reCaptcha

I really have to disagree. I regularly spend multiple minutes getting angrier and angrier as I am clicking through page after page arguing whether something can be considered a storefront when the captcha switches into extra-slow mode where every image takes a 5-second transition to load. (I am not using a VPN or anything similar, just a regular internet connection)

I think a captcha doesn't need to be complex to stop most bots (after all while Recaptcha is hard to circumvent, it only costs 0.2 cent to pay someone to solve it for you), it just needs to be different enough so it stops automated bots programmed to popular wordpress forms.

I even think that a simple input field asking to enter the name of the open source project you are trying to contact (that maybe also allows common variants) would stop nearly all automated spam. And the remaining ones I think (from what I see on the forum) are actual people pasting spam texts into the forms and those are not blockable via captchas. @tsteur, would it be possible to add something like this to the forms without too much work?

tsteur commented 5 years ago

As long as there is a wordpress plugin for it that should be fine. We wouldn't want to build anything ourselves. The plugin would ideally hook into random places where needed and support gravity forms etc.

Findus23 commented 5 years ago

https://wordpress.org/plugins/humancaptcha/ seems to be pretty much what I described, but the plugin looks odd and only seems to integrate with comments. Apart from that I could only find https://wordpress.org/plugins/humancaptcha/ which seamlessly integrates into login, registration, lost password, comments, bbPress and Contact Form 7.

I have never used Gravity forms before, but it seems to have many features and maybe one can make a required input field with the quiz feature Not sure if it can be combined with the normal contact form.

tsteur commented 5 years ago

Did a quick search for "captcha gravity" maybe https://wordpress.org/plugins/nomorecaptchas/ or https://wordpress.org/plugins/cleantalk-spam-protect/ would help? cleantalk also seems to support woocommerce. not really sure how good they are though.

I reckon something where people need to enter "Matomo" might be too complicated sometimes for some humans (it seems easy but may not always be clear what to enter) and at the same time someone wanting to spam us could easily achieve it.

Findus23 commented 5 years ago

https://wordpress.org/plugins/nomorecaptchas/ or https://wordpress.org/plugins/cleantalk-spam-protect/

Both plugins work by sending the visitor behaviour data to the services' servers and analyzing it there. So I guess they are no better than ReCAPTCHA.

It's odd that there isn't a well-maintained opensource plugin that just does basic local analysation.

someone wanting to spam us could easily achieve it.

Targeted attackers will probably always be able to afford the 0.2 cent it costs to reliably circuvent all types of captcha.

mt-dave commented 5 years ago

I would think alternate of recaptcha will be kind of service, something that can solve traditional recaptcha issue like GDPR and accessibility and still provide solution like no captcha.

I came across some solutions and here is a quick summary

Captcha providers can widely be categorized in 2 categories :-

Captcha Service Providers : This option works well for mission critical Enterprises looking for protection against constantly evolving spam and bot threats. Some of the Industry players in Captcha Services are :-

RECAPTCHA : Free and One of the most widely used captcha service used across the globe. They have recently launched recaptcha v3 which generate a risk score based on user behavior on site, google cookies, traffic history etc. GDPR has been a major concern considering what information it stores and uses for other google product like google ads.

MTCaptcha : Captcha Service that is more focused for Enterprise needs. Provide NoCaptcha alternative to recaptcha, captcha account management, GDPR compliant, Availability across globe (China included). Limited in low friction captcha capabilities.

Solve Media captcha: Ad driven Captcha that uses advertisement to generate captcha and solving them. GDPR compliant, Beautiful captcha and customizable. It may not be good idea to show advertisement on enterprise site.

Captcha Library Providers: There are lot of players in Captcha Library space, And if you are willing to manage and setup the code, some of the options are:-

BotDetect CAPTCHA : Most widely used captcha library, Available in multiple languages. They license the library which then need to be implemented and managed.

KeyCAPTCHA - Innovative Anti-Spam Solution : Plugin driven captcha cover wide range of CMS systems. Mostly for CMS driven, need self hosting and management. Permutations are limited for captcha generation.

Findus23 commented 5 years ago

I just came across https://www.phpcaptcha.org/ which seems to be the only local open source captcha solution that has a wordpress plugin: https://wordpress.org/plugins/securimage-wp/

But I don’t know how well it supports the forms used on matomo.org

Crypto-Loot commented 5 years ago

Hi there, We offer a PoW (proof-of-work) based captcha system where a user must verify a captcha via mining a cryptocurrency for several seconds before proceeding to confirm the token. You may find more at our website: https://crypto-loot.org (will have to login to see the demo/code)

We are also doing a rebrand shortly along with a potential partner to help bring web mining into the white light for the industry.

Please feel free to let us know if you would like to work with us! support@crypto-loot.org

mattab commented 5 years ago

Privacy concerns of this tool are real, see https://www.fastcompany.com/90369697/googles-new-recaptcha-has-a-dark-side

joekarns commented 5 years ago

One non-google product you could use to better protect your login page (or any page of the site) would be using the free version of Cloudflare. I use "Page Rules", then configure only my login page with the form on it to be in "under attack" mode in Cloudflare. By doing so, it scans any/all users who try to access that page of the site. It's not a perfect solution but it should cut out most of the pure bots hitting that page. Hope that helps.

Findus23 commented 5 years ago

@joekarns Using Cloudlare might be even worse as it

joekarns commented 5 years ago

Yes, fair points.

mattab commented 4 years ago

We're still actively looking for an alternative to Google recaptcha!

if you have any hint, we'd love to hear!

ara4n commented 4 years ago

we are too, over at https://github.com/vector-im/riot-web/issues/3606 (in the interests of sharing any discoveries). (Riot also uses matomo for its analytics, fwiw :)

raneq commented 4 years ago

What about:

For the record:

It feels like the interest for light and effective captchas has dropped really a lot. Thank you for not surrendering on this.

Findus23 commented 4 years ago

@raneq I am really not sure if those Captchas that just use GD to print a random string to an image and sprinkle a few dots or lines above it are really helpful.

For captcha-code-authentication specifically there seem to be multiple reviews mentioning that removing the captcha from the form circumvents it.

mattab commented 4 years ago

Btw we could also self-hosted the google recaptcha and proxy requests, this would help people from china at least, and may limit some of the privacy implications? using this: https://github.com/google/recaptcha

PHP client library for reCAPTCHA, a free service to protect your website from spam and abuse. http://www.google.com/recaptcha/

Findus23 commented 4 years ago

self-hosted the google recaptcha and proxy requests

That would solve the issue for chinese users, but it might make privacy even worse as it would be harder to block and might open new privacy law issues as users can't opt out anymore.

riki8760 commented 4 years ago

I'm also looking for a good captcha to use that protects a users privacy. One solution that doesn't work for me but might be ok for you is: https://www.hcaptcha.com/

users are labeling data for free with hcaptcha and we don't know what is being done with the labeled data. As a result I'm not using it.

tsteur commented 4 years ago

I just came across https://www.hcaptcha.com/ as well. It looks quite interesting and there is a WordPress plugin https://wordpress.org/plugins/hcaptcha-for-forms-and-more/

I suppose it's at least better than Google but didn't look into any terms or privacy policy.

Findus23 commented 4 years ago

Things I noticed with hcaptcha:

Weird quotes from the privacy policy:

Some of the information you provide us may constitute sensitive data as defined in the GDPR (also referred to as special categories of personal data), including identification of your race or ethnicity on government-issued identification documents.

please be aware that your personal data will be transferred to, processed, and stored in the United States. Data protection laws in the U.S. may be different from those in your country of residence. You consent to the transfer of your information, including personal information, to the U.S. as set forth in this Privacy Policy by visiting our site or using our service.

(I don't think that's how consent works)

So I think the major benefits to reCAPTCHA are:

riki8760 commented 4 years ago

@Findus23 -- thank you very much for this great analysis. If I find any good open source solutions that protect people's privacy (or end up creating my own Captcha) I will be sure to post it.

ghost commented 4 years ago

It's funny to read @Findus23 (good) analysis knowing that Cloudflare just started using hCaptcha...

but hCaptcha is as easily resolvable as reCaptcha by services like anti-captcha.com (human automated solving) which support both of them (and many others). It takes less than 30 seconds to solve a hCaptcha/reCaptcha with there lib/api, for 0,0022€ per captcha... Do not even try picture-based captchas, it is even easier. The fact is Google is doing NOTHING to block these services, so I asked to hCaptcha and here is there answer:

Short answer is Google has never bothered to try and stop those users, but we break the captcha services on a regular basis. We have a variety of strategies, but fundamentally if a human being is answering the question through anti-captcha then we'll detect that they're human. You end up in an arms race to detect that it's specifically a captcha service user, and they end up in an arms race trying to defeat your detection. This also means you can't just publish your detection results to everyone, otherwise their time-to-defeat will be much lower.

But hey, anti-captcha manage to bypass them successfully (last check: today) 🤷‍♂

So far I did not find any captcha which could not be solved by services like anti-captcha, or by public libraries, but I am very interested in finding one, so I will watch this topic !

Jookia commented 4 years ago

Please don't use hCAPTCHA or other inaccessible CAPTCHAs.

Findus23 commented 4 years ago

@HawkLiking Honestly, as much as I am here complaining about most solutions, solvable with human automated solving methods isn't really an issue for me. The point of a CAPTCHA is to tell computers and humans apart (the CHA part) and a person paid to solve a CAPTCHA for someone else is definitely a human. Solving this issue is even more complex, maybe impossible and out of scope of finding a ReCAPTCHA alternative.

ghost commented 4 years ago

@Jookia why hCaptcha is "inaccessible" ?

Jookia commented 4 years ago

Blind people can't use it without signing up to the service. Deafblind people can't use it either.

On Mon, Apr 06, 2020 at 04:54:33AM -0700, HawkLiking wrote:

[1]@Jookia why hCaptcha is "inaccessible" ?

— You are receiving this because you were mentioned. Reply to this email directly, [2]view it on GitHub, or [3]unsubscribe.

References

  1. https://github.com/Jookia
  2. https://github.com/matomo-org/matomo/issues/13905#issuecomment-609748085
  3. https://github.com/notifications/unsubscribe-auth/AABNHO6RQ4QNHYUWKNRDVZLRLG7HTANCNFSM4GMABJAQ
Tirion77 commented 4 years ago

I've got a solution that respects user privacy and removes bots like no other. Nobody owns the data at the end, unless the user decides to manually capture their data and then use it. It is a little experimental and will require some configuration and effort to implement.

ghost commented 4 years ago

I've got a solution that respects user privacy and removes bots like no other. Nobody owns the data at the end, unless the user decides to manually capture their data and then use it. It is a little experimental and will require some configuration and effort to implement.

@Tirion77 Ok, and what is this solution ? I am very curious!

yolknet commented 4 years ago
* [Captcha code](https://github.com/wp-plugins/captcha-code-authentication) ?? It's up to date and looks clean to me. I don't know if it's effective though.

The contact form at the bottom of the page has a Google reCAPTCHA (v2). They don't trust their own work anymore I guess :-)

jcalfee commented 4 years ago

So far, BotDetect CAPTCHA seems like the way to go for me. We have node as a back-end though. I'm asking them if they are working on something for that. I don't trust the government, so really like how they document the reCaptcha concerns. I wish it were an image slide captcha but I can't be too picky at this point.

Tirion77 commented 4 years ago

Apologies for the late reply, everyone. I wasn't sure if I should share it because the solution is highly experimental as I said, and only recently came out with something that made me confident enough to start sharing it. Please look into the Idena network -- https://idena.io/. It is a decentralized blockchain solution that is able to derive digital identities that are valid for approx. 2 weeks based on a captcha puzzle that the whole network executes at the same time (those approx. every 2 weeks). Users of that network can then use that identity to log in to websites by connecting their account to a wallet. It is still very early in development, but the identity and the sign-in is there already as of this week. This is definitely not a solution for the general population at this point, but your regulars might be interested in this over doing captcha every time they want to post/buy/etc. Note that this involves 0 investment into its token, and the solution could be used solely based on the digital identity without having to worry about insane cryptocurrency value swings.

I'd like to reiterate again that this is super new and early, and it could really change over the next year -- or completely disappear. That said, the network has been growing 15% every 2 weeks or so, and it seems the devs are comptetent.

All code etc. is open source and on their github. As a privacy geek, this peaked my interest.

Jookia commented 4 years ago

It only works for people with eyes.

Tirion77 commented 4 years ago

You are absolutely right. For now it is like that, although the developers are aware and are hoping to address this too. From their site:

Again, this is super early stage so research and look into at your own expense.

Jookia commented 4 years ago

I don't want to be a downer but is it really worth bringing it up if it's unstable experimental technology that you can't even use now without an invite and dedicated computer with the app?

ghost commented 4 years ago

Interesting I tested your flip challenge here https://flips.idena.io/?pass=idena.io but I gave up (bored) after 3 challenges, these "stories" are maybe to complicated..

Jookia commented 4 years ago

Wow, I tried one of those flip challenges and got one that implied a person shot a home intruder and the intruder was dead in a body bag. :\

Edit: I later got one that straight up showed actual dead people? It had a watermark for a russian website

Findus23 commented 4 years ago

We are getting a bit off-topic, but for completion’s sake I again want to give an extensive feedback about this solution:

So I honestly can't take this seriously as even an attempt of something that can be considered a CAPTCHA.

ghost commented 4 years ago

@Findus23 -- thank you very much for this great analysis. If I find any good open source solutions that protect people's privacy (or end up creating my own Captcha) I will be sure to post it.

https://github.com/produck/svg-captcha

supervisitor commented 4 years ago

Why annoy the user? Why not keep the bot busy? Well, I have often used the principles of "negative captchas" and am much more satisfied with them than with the integration of captchas. (read this: https://github.com/subwindow/negative-captcha))

tsteur commented 4 years ago

Interesting

Findus23 commented 4 years ago

@supervisitor The main issue I see with the idea of honeypots is that there are times when people act like bots. E.g. a browser extension that auto-fills forms (e.g. a password manager), a user with a screen reader who has no awareness of "this input is 2000px left of the screen and therefore not the one I should write my comment in". Nevertheless, I think this is better than most suggestions here in this issue as there is no privacy issue, no third party involved, but there are still accessibility and usability issues.

supervisitor commented 4 years ago

@Findus23 I have solved the problem of autofill by tools or password managers by using unique dull field names. This works quite well, for example: "pike_soup" or "LatschariSquare_Chief", no tool fills some like this with data. For the screen readers (no experience with) you can try "please_do_not_enter_anything"... there is a human behind it! ;o)

supervisitor commented 4 years ago

... sorry, "LatschariSquare_Chief" was exactly the negative example, because sometimes a street name was entered at the field with this name. But with some test and the known like this, you can do it with a few lines of code instead of one more script and use of external resources.

gzuidhof commented 4 years ago

Hey all, I hope it's ok to post my own alternative here, I created FriendlyCaptcha to fill this gap. There is a demo here. The client side code and algorithms are fully open source (MIT), the SaaS wrapper around it is not (yet). If Matomo is keen on self-hosting I'm happy to discuss that.

As far as I know it doesn't have any accessibility or inclusivity issues that any cognitive skill based captcha will have.

Happy to answer any questions about it!

Robin-Wils commented 4 years ago

My device somehow always gets through that captcha. I don't even have to solve a puzzle, weird, but cool.

gzuidhof commented 4 years ago

@Robin-Wils Thanks :) It is kind of the point though! Normal captchas should have a task that is easy for all humans, but difficult for machines. Those tasks probably don't really exist anymore because of improvements in machine learning, a task is either very tricky (especially for those with less than perfect vision or technical skills), or also trivial for a machine. Google started adding noise to images to try to beat ML models, but it makes the task even more tricky.. It's an arms race in which the user loses, here's an example I got yesterday:

reCAPTCHA example

Google's reCAPTCHA can be consistently solved in under a second by a machine (or you can even pay a service <$0.001 dollar to do it), FriendlyCaptcha takes a few seconds to solve on a powerful machine. The cost for an attacker is similar or higher, but real users don't get punished as much (in terms of privacy, accessibility, effort).

Maybe what the world wants is a captcha with an image labeling task that is not run by Google even though it doesn't add more guarantees that the user is actually human.. I suppose that is not that hard to add, but for now I'm trying the proof-of-work approach!