pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
304 stars 444 forks source link

reCaptcha not working from China #2993

Closed ajnyga closed 4 years ago

ajnyga commented 6 years ago

Hi,

I do not have first hand knowledge of this, but one of our journals is reporting that an author from China is unable to register to OJS because the reCaptcha is not visible.

This is probably due to local internet policies and noticed elsewhere as well: https://github.com/stellar/stellar-client/issues/1162

Should OJS have a secondary system for preventing spam that would not have the limitations of reCaptcha?

mfelczak commented 6 years ago

Confirming that some of our hosted clients have also run into this issue.

asmecher commented 6 years ago

@mfelczak, do you have any preference from that client? I suspect we'd need to look at a built-in service again, though I really don't like maintaining our own. I suspect there's a third-party library we could use.

mfelczak commented 6 years ago

Hi @asmecher, I've checked with the client and they couldn't provide any alternatives. Digging around a bit, BotDetect Captcha looks promising for a possible 3rd-party integration: https://captcha.com/php-captcha.html

asmecher commented 6 years ago

https://github.com/gregwar/captcha also appears to be heavily used, and it's in Composer with very few dependencies...

jmacgreg commented 6 years ago

Hi all, see also https://pad.foebud.org/google-alternatives for some alternatives. The Recaptcha-specific section is quoted below:

http://textcaptcha.com/

https://www.scorchsoft.com/blog/recaptcha-alternative-honeypot-spam-prevention/ honeypot forms

checking IP addresses against RBLs

https://captcha.com/ - BotDetect, a CAPTCHA implementation

https://akismet.com/ Akismet, known spammer API -> https://de.wikipedia.org/wiki/Akismet#Datenschutzprobleme_in_Deutschland,_%C3%96sterreich_und_der_Schweiz

https://www.drupal.org/project/botcha Botcha "anti-captcha" (the technique used is very easy to implement elsewhere but also very effective) 
GrazingScientist commented 6 years ago

Although not coming from China, in my public institute in Germany, we have a need of a reCaptcha alternative and hence would be very thankful for having a out-of-the-box-plugin.

carzamora commented 6 years ago

I want to propose a very simple idea, and this can be implemented even (I think) with reCaptcha enabled, a Honeypot like this: http://tidyrepo.com/registration-honeypot/

taken from the link above:

Registration Honeypot works by adding a hidden text field to your registration form labeled “Only fill in if you are not human.” Users will not even be able to see the text field, since it is hidden, let alone fill it out, so registration process will go unaffected. Spambots, on the other hand, will fill it out automatically, and will be kicked out of the registration process and redirected to an error page that will not let them continue.

mt-dave commented 5 years ago

I came across MTCaptcha, this captcha service works in china and also have no captcha capabilities. Seems a good recaptcha alternate.

jonasraoni commented 5 years ago

Hi guys, resurrecting this old issue =]

So, I vote for the https://github.com/gregwar/captcha!

p.s.: There's another popular captcha written in PHP (https://github.com/mewebstudio/captcha), but it was built for Laravel and has more dependencies.


About the implementation, two things that came into my mind...

asmecher commented 5 years ago

I think these distorted-letters-based CAPTCHA tests are widely considered to be easily crackable, and neither library includes any accessible alternatives. I wonder whether there's not a wholly different approach with decent support that's possible to run locally (or via a defined proxy) -- scanning the suggestions e.g. on https://www.w3.org/TR/turingtest/. @NateWr, any thoughts on this? How about e.g. @jmacgreg on accessibility?

As much as I don't like relying entirely on Google for this, I suspect proxying the scripts would be more effective than a distorted-letters implementation, and would include accessibility support.

(We used to have our own homemade distorted-letters generator, but I happily got rid of it a few years back.)

jmacgreg commented 5 years ago

I won't have much useful to say on this, but @israelcefrin will!

jonasraoni commented 5 years ago

@asmecher about proxying, the own OJS instance might have limited external access, so I think we'll need a fallback anyway.

I personally like random logic challenges like: mark the first and last checkboxes, cat/snake refer to animal or tool, leave the field empty if you're not a robot, write the result of 2*1 and tricks to deceive the bot (fake/duplicated fields), but they are weak...

Is OJS a direct target of spammers or we're just trying to defend against generic bots?

Let's see what the other guys have to say, I might research alternatives later as well =]

NateWr commented 5 years ago

The link that Alec shared is good at outlining the pros and cons of different approaches. I don't think that any technique in the "Interactive Stand-Alone Approaches" category is going to be accessible. That includes captcha, logic games, etc.

Of the non-interactive approaches, I think a honeypot is our best bet. And we should probably consider using this as a standard part of the application, not just a plugin that gets added on. Honeypots are very effective at defending against generic bots that aren't specifically targetting OJS, and will likely be sufficient to cut down on the majority of spam problems faced.

For OJS instances that need to be hardened -- usually because they are a direct target, as jonas mentioned -- we should consider the "multi-party" approaches in the document that Alec linked. These include Google's ReCaptcha for journals that aren't concerned about accessibility. But there's also third-party services like Akismet, which supports non-WordPress uses. I think the forum thread that Alec linked before had a plugin with Akismet support.

israelcefrin commented 5 years ago

Hi all, just a thought on accessibility. Currently Google ReCaptcha is concerned with this issue (accessible forms) and they even have a section explaining how accessible their solution is. However, tests with real users with disabilities have shown that it is not fully accessible/usable and it becomes a barrier for users with different devices to pass the "captcha" test.

I agree to @NateWr :

Of the non-interactive approaches, I think a honeypot is our best bet.

And Pitt has those 2 plugins from that post that @asmecher shared on the meeting: https://github.com/ulsdevteam/pkp-akismet https://github.com/ulsdevteam/pkp-formHoneypot

I've talked to Clinton and he told me that they could contribute these both plugins to the OJS plugin gallery.

jonasraoni commented 5 years ago

Oh, I didn't know the term honeypot, but that's what I meant with "tricks to deceive the bot" 😁

I'm just curious about how inaccessible is a logic puzzle, something with a question and a simple answer seems to be as complex as filling the form itself for me 🤔 I'll try to find a research about it when I arrive at home to kill my curiosity.

I just read the honeypot plugin's description and it seems to be ok assuming that we just need to get rid of generic bots.

israelcefrin commented 5 years ago

I'm just curious about how inaccessible is a logic puzzle, something with a question and a simple answer seems to be as complex as filling the form itself for me

According to W3C document shared by @asmecher , it relies on the Understable Principle of Accessibility. To make a logic puzzle accessible we need to work with language, learning and cognitive issues to solve it. It is not impossible, but to make it work, it is recommended a comprehensive round test with different users.

A honey pot approach wouldn´t add any extra workload on users to fulfill a form but on bots.

jonasraoni commented 5 years ago

@israelcefrin I just read the W3C link... Looks like after all these years everybody is still in the same boat (while the bots are almost beating us).

I like the honeypot solution, and if it's not enough, we can extend it (e.g. monitor if a given user is triggering too many actions in a small period of time).

NateWr commented 5 years ago

To make a logic puzzle accessible we need to work with language, learning and cognitive issues

Yeah, the main issue is coming up with something that works across all the different languages/cultures that our product is used in.

I like the honeypot solution, and if it's not enough, we can extend it

:100: It's far easier to build something smarter on a case-by-case basis, for the 1-2% of cases when the honeypot isn't enough, than to try to devise a single solution that works everywhere.

jonasraoni commented 5 years ago

With the release of the honeypot plugin into the gallery I guess this issue can be closed, right?

NateWr commented 5 years ago

@mfelczak are you happy on the PS side for this issue to be closed?

mfelczak commented 5 years ago

Thanks @NateWr and @jonasraoni. Yes, this should suffice -- we'll test with a few hosted journals.

jnugent commented 4 years ago

Hi folks,

We've had requests from hosted journals who have users in China that are unable to sign up due to google.com being unavailable in China. Our workaround thus far has been to edit classes/form/validation/FormValidatorReCaptcha.inc.php and classes/template/PKPTemplateManager.inc.php and switch the www.google.com references to www.recaptcha.net which works in China. If this was permanently added to pkp-lib the only other modification would be to switch enable_cdn to off. The recaptcha.net domain is owned by Google so it probably isn't going anywhere. I can provide links to journals that are using the alternative URL with no problems if it is helpful. (And apologies, @asmecher, for the dupe post earlier)

asmecher commented 4 years ago

@jnugent (and others), is there any downside in just using www.recaptcha.net as proposed above?

mfelczak commented 4 years ago

A couple notes here to add to Jason's summary above. Adding support for recaptcha.net via a new toggle in config.inc.php or even as the default to replace the existing google.com implementation would expand the anti-spam toolset available to OJS users. There will be journals who can't subscribe to Akismet or who still want a reCaptcha solution that works for all visitors. At the moment the only free alternative to the default reCaptcha is the honeypot plugin.

NateWr commented 4 years ago

Is there a downside to just switching the hostname to recaptcha.net? Would any existing installs be effected? Perhaps if servers have whitelisted domains that are permitted to make external requests...

asmecher commented 4 years ago

:+1: OK, I fully support this proposal. @jnugent, could you open a PR for it?

jnugent commented 4 years ago

I will! @asmecher what did we decide on? Changing the urls in the various classes, or the config.inc.php option to allow people to toggle between the two?

asmecher commented 4 years ago

I think just universally using the recaptcha.net domain is best/simplest!

asmecher commented 4 years ago

Implemented at https://github.com/pkp/pkp-lib/pull/6114!