mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
389 stars 196 forks source link

Detect personal correspondence and deter misuse of services to send such correspondence #6001

Open RichardTaylor opened 3 years ago

RichardTaylor commented 3 years ago

Review personal correspondence sent via the service and see if it has detectable characteristics which can be used to identify it automatically and prompt a action such as a warning, or even preventing correspondence being sent.

This could be via matching of phrases eg. "my application", "my driving licence" or the inclusion of a social security number.

itsaphel commented 3 years ago

@RichardTaylor Is this one suitable for a volunteer to work on?

garethrees commented 3 years ago

Is this one suitable for a volunteer to work on?

We do have a little bit of prior art here, where we detect length of text and show a warning when the text starts getting long.

It feels like a more generalised "request content warning" could be extracted from this, where we detect both phrases and length. We'll have to consider how we store the phrases that get identified. Are they general phrases, or are per-authority like what we did as a pre-text entry step for a few troublesome bodies?

I think the first step here is shaping this issue from an idea into a loose plan of what it is we think will be a good solution for both WhatDoTheyKnow and the international reusers of Alaveteli.

itsaphel commented 3 years ago

My experience volunteering in other large communities is that building the regexes is best a separate task from technically implementing a check. It's probably better if these can be configured by the WDYK admins (or admins of other installs) and then they can tweak as appropriate (adding/removing phrases and tweaking for false positives).

An external example might be the AbuseFilter extension on Wikipedia; the regexes themselves are managed by volunteers of the installation.

Not sure how this idea will translate into the WDYK system (eg depends on how technical the admins of the largest installs are). Thoughts?

garethrees commented 3 years ago

that building the regexes is best a separate task from technically implementing a check

Yeah, 100% agree that we want to decouple the mechanism from the specific phrases to match.

depends on how technical the admins of the largest installs are…

There's a huge range from basic computer literacy all the way up to expert programmers!

We do allow regex to be entered for censor rules, but I don't think they're generally well understood.

It's probably better if these can be configured by the WDYK admins (or admins of other installs)…

We have a couple of different existing patterns for configuring things like this.

  1. Providing an accessor so that themes can set their own values.
  2. Allowing more complex configuration to be set through data files
  3. Creating admin interfaces so that site admins can manage the configuration without requiring developer input.

It's not a universal truth, but we often tend to start features out at stage 1 – providing accessors – and if we find we're needing to edit them frequently we then consider whether an admin interface is worth adding (and maintaining).

I think the more unresolved question here is how the check mechanism works – once we understand that we can figure out where to pull the phrases from.

RichardTaylor commented 1 year ago

There's a proposal to generalise this to apply to other types of misuse at https://github.com/mysociety/alaveteli/issues/6402#issuecomment-1321240891