openrightsgroup / blocked-org-uk

Template front-end code, markup, style-sheets, images and other assets for the Censorship Monitoring Project (blocked.org.uk)
https://www.blocked.org.uk/
GNU General Public License v3.0
13 stars 5 forks source link

Stats about ISP replies #372

Closed JimKillock closed 5 years ago

JimKillock commented 5 years ago

Hi there,

I think the stats about ISPs and MNOs responses needs tackling this cycle, pre report launch.

Here's my stab at what is needed. The current stats needs separating as it mixes two concepts.

https://www.blocked.org.uk/reported-sites

(1) Concept one: numbers of reports sent and unblocks done

(2) Performance of each operator (annual figures?)

(a) Total requests sent (b) Total auto responses logged by us (c) Total human responses logged by us (d) Time from (a) to (c) where response received (e) Total of cumulative unresolved reports at year end;* (f) Link to a list of unresolved reports per operator

*This should not be zero but for good performance it would be a month's worth of complaints.

Then we can probably ditch the measure around unblocks we were doing, as the replies time is sufficient. That is currently done as:

(a) Sites uniquely unblocked by this provider (not blocked elsewhere)

(b) Response time for those requests

JimKillock commented 5 years ago

Note I've tried to make the ticket here clearer so have edited it

edjw commented 5 years ago

@dantheta – Can this be the next ticket you work on? Alex and Jim need it for the report that they're aiming to finish drafting by Friday.

dantheta commented 5 years ago

Adding https://www.blocked.org.uk/control/ispreport/reply-stats

JimKillock commented 5 years ago

This is looking good. Can we add to the two tables:

(a) In the gross figures, add a line for "Unresolved reports", being those reports where there is neither a reply or an unblock recorded; and add columns for for 2017, 2018 and 2019 plus total;

(b) In the per ISP reports, add a line for "Unresolved reports" on the same basis.

(c) for the ISP performance table, can we do a table for 2018 and second table for 2019

dantheta commented 5 years ago

Updated for (a) and (b).

For (c), do you mean the table titled "Reply stats by network" should be repeated, once for 2018 and once for 2019?

JimKillock commented 5 years ago

Thanks, on (c) you have found a better solution, thank you. Yes I did mean that table.

A couple of issues:

(1) On a scan of the "Avg reply interval (WIP)" in my email, ignoring non-replies, most replies seem to take around 3-5 days.

(a) I think our methodology may be off therefore: are we including the time taken for non-responses for instance? I think we should only measure the response time where a response has been given.

It's clear that a bunch of email requests go missing and never get answered, but from a service perspective that is a different problem from taking a long time to reply.

(b) This is just a double check to say we should calculate the time elapsed from the point that we sent out the email, not the time that we logged the requests. I expect you are doing this but of course I can't tell!

(c) Similarly, any reports where the users' emails were never sent, because the email was not verified, or for whatever other reason, cannot be counted in the these stats (in either table).

(2) Looking at the "open" list from the second table links, these contain sites which have nevertheless been unblocked. From the perspective of the stats, we should count sites that have been unblocked as "resolved", as we cannot expect a further answer. They should of course still be counted as not having received a reply.

Other than that the tables look very clear, thank you.

edjw commented 5 years ago

@dantheta When you get a chance, can you clarify how the statistics are working for the points Jim raises here?

alexhaydock commented 5 years ago

We may need to either rethink some of the data points here. It seems like the 2017 data shows that 821 reports were sent, of which 684 were answered.

Now that data may be in the database and just not visible to me (but how were we capturing the fact that ISPs were replying back then, but not capturing the actual reply text?).

But it seems odd to me that of the 684 which were answered, apparently all 684 were unblocked and there were no rejections?

JimKillock commented 5 years ago

@alexhaydock For 2017, AIUI, the data is showing that the notification has been "responded" to in that an unblock has followed.

After mid 2018 we have a better idea of responses, in that we can log replies.

However, ISPs are in a grey area when they receive a notification from us about a blocked site, but on investigation, find that the site has already been unblocked by Symantec or whoever. If they fail to respond to our users at this point, are we really entitled to "punish" them in the figures by recording a "no response"?

It seems to me that we have to accept that an unblocked site requires no further acknowledgement and is an adequate response.

JimKillock commented 5 years ago

@alexhaydock Looking at the table a second time, I see what you mean.

We have:

Category | 2017 | 2018 | 2019 | Total Reports sent: | 821 | 1252 | 171 | 2244 Reports answered: | 684 | 867 | 74 | 1625 Reports unblocked: | 684 | 815 | 56 | 1555

In this set of figures, the 2017 figure for "Reports answered" should be a "NA" result, as we simply don't know if the reports were "answered", we only know that many were "unblocked".

dantheta commented 5 years ago

At the moment, the reply interval is based on reports where we've marked one of the replies to indicate that the report has been closed (unblocked or rejected). The time interval used is the different between the submission date and the email that closed the case. It's still a work in progress!

We'd need to work out what we want to use for reports that pre-date the reply recording. If the site has been unblocked since the report was made, the unblock date could be used, but that will push the average up.

There's also some data corruption that's taken place because the last updated date was updated when a category was assigned; that column should contain the unblock date. I can recover this. The inflated last_updated dates are also pushing the average up.

JimKillock commented 5 years ago

Thanks Dan.

I think we have to ignore calculating the time interval for sites prior to the point we recorded the ISP replies. We don't have the data and it is unfair / impossible to try to guess.

A wider point: we should precisely record the nomenclature and methodology we use somewhere. Perhaps the wiki for now? Happy to use something else (preferably public) tho.

dantheta commented 5 years ago

The time intervals have been fixed now.

JimKillock commented 5 years ago

I think the only thing to fix here now is the "reports answered" totals on the "ISP Reply Stats" table:

Reports answered: | 684 | 867 | 74 | 1625 Reports unblocked: | 684 | 815 | 56 | 1555

which needs to reflect the presence of a non-auto reply, rather than an unblocked site. (we have no data in 2017).

dantheta commented 5 years ago

The "reports answered" figure has been updated to use report status (unblocked/rejected) and the presence of reply emails.

dantheta commented 5 years ago

New totals:

Reports sent: | 821 | 1252 | 277 | 2350 Reports answered: | 0 | 170 | 159 | 329 Reports unblocked: | 684 | 815 | 57 | 1556

alexhaydock commented 5 years ago

It might be worth noting that the Reported Sites page on the frontend seems to be continuing to use the older method of calculating ISP response times and should be updated really. Ideally before the report goes public in case we start getting more interest in the Blocked site as a result.

dantheta commented 5 years ago

The frontend site is covered by #386. Closing this one.