spacebase / spacebasenz

Website for SpaceBase New Zealand
http://dev-spacebasenz.pantheonsite.io/
GNU General Public License v2.0
0 stars 0 forks source link

Honeypot Defeated - Needs update/fix or swap for captchas #300

Closed treasuretron closed 4 years ago

treasuretron commented 4 years ago

In GitLab by @richbodo on Dec 11, 2018, 12:25

Summary:

We are clearly getting several fake user accounts generated per day, all with the same form of randomly generated username. We either have to fix the honeypot to defeat these attacks or return to captchas.

URL:

https://spacebase.co/admin/people

Expected:

No repeated patterns of usernames. (And, frankly, no new users. We haven't told anyone about the site, yet.)

Observed:

Repeated patterns of usernames and lots of new users:

Screen_Shot_2018-12-11_at_9.12.50_AM

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 11, 2018, 16:49

I suspect this is because your sign up form is overridden on the theme level so that the honeypot fields aren't actually in the DOM, we will investigate today. For the time being I've set account registrations to "administrator approval is required". Log of related links...

https://www.drupal.org/project/honeypot/issues/2811189

https://www.drupal.org/project/honeypot/issues/3018243

https://www.drupal.org/project/honeypot/issues/3009566

https://www.drupal.org/project/honeypot/issues/3002022

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 11, 2018, 17:43

On inspecting the page output the honeypot form is after the submit button, so Ill apply the patch from https://www.drupal.org/project/honeypot/issues/2811189

It's probably also worth installing "password_policy" to troll the bots further, it's ready for assessment https://www.drupal.org/project/password_policy/issues/2286053

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 11, 2018, 19:31

assigned to @jayelless

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 11, 2018, 19:31

unassigned @thomasmurphy

treasuretron commented 4 years ago

In GitLab by @jayelless on Dec 11, 2018, 22:03

I have added two patches to the honeypot module to improve its performance against spam bots. I have also installed the password_policy module along with a basic default policy to ensure that all user generated account passwords on the site meet mimimum standards: Length: at least 7 chars Char types: any three of Uppercase; Lowercase; Digits; Special chars Consecutive: max of 3 consecutive identical chars.

Password expiry is not set at thus stage.

treasuretron commented 4 years ago

In GitLab by @jayelless on Dec 11, 2018, 22:05

mentioned in merge request !246

treasuretron commented 4 years ago

In GitLab by @richbodo on Dec 11, 2018, 22:38

marked this issue as related to #293

treasuretron commented 4 years ago

In GitLab by @richbodo on Dec 11, 2018, 23:54

@jayelless

O.k. this is an interesting issue because it coincides with some other stuff we are working on, and there is a lot to impart, really. You can skip this comment and just send me a calendar invite for a face-face meeting instead, but here goes...

Regarding the anti-spam solution:

We should implement the solution you suggest.

Deciding on this one is easy because if we still have spam after we clear this out, we have not solved the problem. A month from now, if we haven't seen a whole bunch of spam, we will know the patches and config worked.

We should not implement recaptchas right now, but one thing about recaptchas - they almost always work. The D8 recaptcha-thingy module needs a patch, I get that, but if worst comes to worst and we're clearing out spam a lot - we can't afford to put our time into that and it's a not a good problem to have - then paying to switch to recaptcha would be well worth it to us and probably a net good for D8.

I just vaguely scanned the the honeypot docs and I see it's one of those hidden-field-on-the-form type solutions against spam. I guess those work (haven't tried them). Seems like spam bots would evolve around that, but maybe not. Anyway, as you suggest, lets patch the honeypot module and get that merged and pushed to production! :)

I also see that honeypot can implement a time delay, which I think is a killer feature - kind of like charging the spammer in CPU time that they could otherwise be using attacking other sites - are there any settings on the honeypot time delay?

Em, Eric and I will want to clean out the spam (not too hard because it's new), verify all the organizations in the system, and pull down a CSV file of orgs. After a couple more bugs are fixed (hopefully including this one), we will communicate with the organization owners and ask them to create accounts and take ownership of their organizations.

However...we might want to arrange a call as soon as possible to walk through the code that creates an organization csv as a cron job. I'm talking about a cron job in php that the system uses to generate a mailing list of our organization owners from the drupal db.

That cron is probably just busted by a fraction of an smidge - it used to work. If it's not hard to fix #293 , then we should fix that issue ( #293 ) and and push it with the fix to this issue ( #294 ) to production. It didn't use the apache solr index or anything wierd like that. It's just a plain php file, don't remember where it is but I can help find it.

I'll just cover the background of that cron job so you know what I mean, but the original issue was #149 (https://gitlab.com/spacebase/spacebase/issues/149)

Interesting factoid: The directory data is actually valuable. When Em and Eric came to NZ, they were told by more than one person in that there were only a few organizations to track in the space industry in NZ - they made personal contact with over a hundred, and added all their basic profile data to a the directory, something no one had seen before.

So we use the data, and the cron job doesn't seem like a big deal, but it's kind of a very nice to have.

We download those daily csv files of org data to communicate with our org owners, help us make lists of people we would like to contact who have not taken over their orgs. With the public info-only version (remove email address column, and you have only public org profile data from that spreadsheet) we compare the csv to previous snapshots and other spreadsheets, and make infographics for presentations and papers showing our analysis of the space industry in NZ to people who want to create positive impact there.

I guess we could ssh in or pull a db copy and then make an SQL query once in a while, or go through the bug-a-dev communications cycle, but Em and Eric and probably future admins of this software, who won't have devs, won't prefer to do that. I think #293 is one of the admin things a system like this should have anyway.

We basically know people working at most of the orgs in the directory already, but we have not bugged them to take ownership of their organizations because the platform wasn't ready. Since search is fixed, and we are well on our way to fixing more bugs, we are looking to contact the org owners ASAP.

Regarding the password_policy module:

We should probably implement this. Not a priority but sounds like low hanging fruit.

I wouldn't think password complexity would have any effect on bots, as defeating that countermeasure doesn't cost anything when creating accounts, which is what spammers do (just default to a very complex random string for the password), but maybe there are some really stupid bots out there.

I wonder, actually, about password complexity relative to just length as a thing for user security. What you want to do is have an estimator that uses rainbow tables live in your login form, something like this: https://www.betterbuys.com/estimating-password-cracking-times/ and then your user gets a good idea and learns how to make a secure password, and you can just forbid passwords below a certain brute-force-rainbow time. But the estimator has to be super solid. Anyway, over beers some other time.

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 12, 2018, 19:30

closed via merge request !246

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 12, 2018, 19:30

mentioned in commit 2cf8b07684a0711c5af4b85e71f4afc13bcd4073

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 12, 2018, 20:20

mentioned in merge request !247

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 12, 2018, 20:24

Two points in reply @richbodo

  1. are there any settings on the honeypot time delay? Yes, it's currently set to 5 seconds minimum, but we can just change that at /admin/config/content/honeypot
  2. The answer to the org contacts issue seems to me and James to be to add contact details for the organisations into the organisation group entities, that way you can retrieve all your details in one view, just by visiting a URL, with a csv export, if necessary.
treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 13, 2018, 00:21

mentioned in commit 4db9f691558f72d8a35c3f35590c1e35cc1c5944

treasuretron commented 4 years ago

In GitLab by @richbodo on Dec 13, 2018, 14:28

@thomasmurphy Good to know the honeypot config settings. Thanks.

Regarding number 2, we're punting on the org download.

We have old spreadsheets that we can compare to the org list - there are a half dozen actual org owners today who are old alpha testers and I think we might have a couple more since then that we can suss out from the current csv.

I think I get what you are saying, though. However, it's not necessary to require an org owner to copy their contact info over to the profiles of their orgs for our benefit. There are more conventional solutions that we can handle at normal priority without adding work for users, so we'll design based on the analytics needs of spacebase admins, and file issues when we have strong agreement on design.

This is done for now subject to QA.

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 13, 2018, 22:48

mentioned in merge request !250

treasuretron commented 4 years ago

In GitLab by @thomasmurphy on Dec 16, 2018, 22:06

mentioned in commit 5812ff57c26a5a3c5ed8b08ac48d5641a11271f7

treasuretron commented 4 years ago

In GitLab by @jayelless on Dec 17, 2018, 16:19

Hi Rich. What forms on the website are subject to the most abuse by spammers? We can look to ensure these are adequately protected. Are there any of these forms (other then the "create account" form, that are available to users who have not logged in?

The password_policy module will not stop spam, but it is a security measure to ensure that user account passwords meet a minimum standard and so are not easily broken by spammers (who could then use a legitimate account).

treasuretron commented 4 years ago

In GitLab by @richbodo on Dec 17, 2018, 18:48

Account creation was the big one, Tom.

There are a number of forms for contacting us, almost all of them are forms that fill out a custom subject line so we know how to respond, and at what priority - and most of them send to feedback@spacebase.co, but there may be some that send to info@, or privacy@.

We get a few spam into feedback@ every once in a while, possibly through the contact form, but not much. Outside of account creation, we haven't see an effective push by bots to spam the site.

-Rich