Open RichardTaylor opened 8 years ago
In May 2018 a member of the WhatDoTheyKnow team attended the Scottish information Commissioner's conference.
They fed back:
the most problematic requests from a FOI officer's perspective are long, rambling ones, often on irrelevant matters, with some poorly-formulated request in the middle somewhere. They say that when they get a request in bullet points, they are really happy, cos it helps concentrate the request into bits of information they actually want.
One made an interesting suggestion: that if the request text goes over a certain length, Alaveteli could put up a confirmation page pointing out that requests should be as specific and concise as possible, and only include material required in order for the public body to identify and produce the information requested. I.e. our "focussed" help text. It should allow the request to be sent, however, because some requests do need to be that length
Another option would be to require admin approval before a user can make a request of over x words long.
This might increase admin workload a little in authorisations; but might help prevent misuse of the service.
The first step is identifying long requests, so we may as well do the initial idea first. Requiring admin approval requires #75.
I think the first thing we should do here is write a script to get a baseline on outgoing correspondence length. Probably average, median, 95th and 99th percentile, like New Relic.
Then, an easy first step would be to have some Javascript that monitors the length of correspondence and adds a warning at "long" and "very long" intervals:
I haven't thought about the exact messages yet, but we can figure that out when we come to implement.
After we've added these we should revisit in 6 months to check against the baseline data.
I don't think we should be worried about a non-JS version of this, as its not vital to functionality.
I don't think we should prevent long requests altogether, as there could well be legitimate reasons for being detailed.
I don't think we should seek approval, as we don't really have the ability to hold requests as pending (though we could store them as drafts) and it would increase admin workload.
It hasn't been mentioned here, but I don't think we should worry about using a reputation score that allows greater length yet, but it could be a next step.
Stats:
# https://github.com/bkoski/array_stats
Float.class_eval do
# Returns true if a float has a fractional part; i.e. <tt>f == f.to_i</tt>
def fractional_part?
fractional_part != 0.0
end
# Returns the fractional part of a float. For example, <tt>(6.67).fractional_part == 0.67</tt>
def fractional_part
(self - self.truncate).abs
end
end
Array.class_eval do
# Returns the sum of all elements in the array; 0 if array is empty
def total_sum
self.inject(0) {|sum, sample| sum += sample}
end
# Returns the mean of all elements in array; nil if array is empty
def mean
if self.length == 0
nil
else
self.total_sum / self.length
end
end
# Returns the median for the array; nil if array is empty
def median
percentile(50)
end
# https://github.com/bkoski/array_stats/blob/6cc1ba4a6cd2903d6c632589713c73db7cd7cd8b/lib/array_stats/array_stats.rb
def percentile p
sorted_array = self.sort
rank = (p.to_f / 100) * (self.length + 1)
return nil if self.length == 0
if rank.truncate > 0 && rank.truncate < self.length
sample_0 = sorted_array[rank.truncate - 1]
sample_1 = sorted_array[rank.truncate]
(rank.fractional_part * (sample_1 - sample_0)) + sample_0
elsif rank.truncate == 0
sorted_array.first.to_f
elsif rank.truncate == self.length
sorted_array.last.to_f
end
end
end
correspondence_counts = OutgoingMessage.pluck(:body).map(&:length).sort
correspondence_counts.mean
# => 800
correspondence_counts.median
# => 506.0
correspondence_counts.percentile(95)
# => 2257.0
correspondence_counts.percentile(99)
# => 4856.0
Reopening as we only dealt with new requests in https://github.com/mysociety/alaveteli/pull/4987.
We'll let this sit for a few months and review whether its had an effect. If it has, then we can apply it to followups too.
Running the stats again today (same time period - 1 year, with a bit of minor overlap for ease):
correspondence_counts =
OutgoingMessage.where(created_at: Time.parse('2018-01-01')..Time.parse('2019-01-01')).pluck(:body).map(&:length).sort
correspondence_counts.mean
# => 894
correspondence_counts.median
# => 581.0
correspondence_counts.percentile(95)
# => 2414.0
correspondence_counts.percentile(99)
# => 4857.0
correspondence_counts =
OutgoingMessage.where(created_at: Time.parse('2019-01-01')..Time.parse('2020-01-01')).pluck(:body).map(&:length).sort
correspondence_counts.mean
# => 882
correspondence_counts.median
# => 603.0
correspondence_counts.percentile(95)
# => 2275.0
correspondence_counts.percentile(99)
# => 4935.0
Looks like there has been a little bit of a reduction overall, but all pretty similar.
On WhatDoTheyKnow problems which take up administrator time are often not with request/response content but extraneous material users include in correspondence.
To try and encourage the focusing of requests on simply describing the information sought how about a message to those whose request is over x words urging them to keep it focused?
A stronger action would be a cap on the length of requests (and perhaps follow up messages).
Focusing requests might help with the overall reputation of WhatDoTheyKnow and other Alaveteli sites.
If in the future there was an option to moderate requests - excessive length could be a factor for putting a request into a moderation queue.