openrightsgroup / cmp-issues

Centralised issue-tracking for the Blocked backend
2 stars 0 forks source link

Search and lists: add exclude BT Strict #239

Open JimKillock opened 5 years ago

JimKillock commented 5 years ago

I'm slightly worried this is inflating our block lists.

A feature to exclude BT Strict from search would be helpful

dantheta commented 5 years ago

If it's useful, I can remove URLs that are only blocked on BT-Strict from existing lists.

alexhaydock commented 5 years ago

I'll let @JimKillock reply on whether that's a good idea, but if you do that are you able to leave the two lists relating to School websites intact?

https://www.blocked.org.uk/list/Schools https://www.blocked.org.uk/list/sch.uk

As BT-Strict and TalkTalk Kidsafe are intended for young children I don't think it necessarily invalidates our conclusion if it's mostly these "more advanced" filters that are filtering school websites.

JimKillock commented 5 years ago

Excluding existing results sounds reasonable to me @dantheta

JimKillock commented 5 years ago

Did we do this?

dantheta commented 5 years ago

Not yet, but it's a subset of the sort by networks in search ticket.

On 18 March 2019 19:46:12 GMT, JimKillock notifications@github.com wrote:

Did we do this?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/openrightsgroup/cmp-issues/issues/239#issuecomment-474072508

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

dantheta commented 5 years ago

I've created a list of the sites blocked only on BT-Strict.

https://docs.google.com/spreadsheets/d/1LHLVQj7ozERmqbrf7ggORZ0t99_LyLnYdmy2JO1q9SA/

Let me know if you're happy for me to go ahead with the removal (apart from the lists Alex specified).

JimKillock commented 5 years ago

@edjw We will need to think about this carefully given where we are at now. We should do it, but it would change the report figures quite a bit.

JimKillock commented 5 years ago

Thanks, I think we're likely to want to make the calculations before we do anything. If we can make the calculations non-destructively via the filtering, that's the best option.

Note, the original request was for a filter rather than a sweep! Maybe this can be covered in the other ticket?

edjw commented 5 years ago

@dantheta - If possible, could you set up this filtering functionality today so we can look at the numbers by tomorrow morning (Friday)?

dantheta commented 5 years ago

Exclude BT-Strict filter is now available on the saved lists summary page. Filtering to just one ISP is available on the lists summary page, the list contents page and the search engine.

The bt-strict exclude is cached, because the query takes quite a while to run.

dantheta commented 5 years ago

Exclude filter is also now available on the list items page. Not sure it makes quite as much sense there, as items which are not blocked on BT-Strict and items which are blocked on networks other than BT-Strict are still shown.

dantheta commented 5 years ago

This one got a little bit complicated - please let me know if there's anything conspicuously missing.

The search network exclusion is a bit trickier; if a negative critera ('BT-Strict' in blocking networks) is applied, it will filter out any URLs which are blocked on BT-Strict. What we'd really want it to do is include all URLs except those which are blocked only on BT-Strict. In the meantime, the search page supports the exclusive option: show results which are blocked on the selected network.

JimKillock commented 5 years ago

Yes I see what you mean. I think the answer would be to have a positive filter, for the same result, eg:

Virgin + BT Light filter + Sky + TT

MNOs

Virgin + BT Light filter + Sky + TT + MNOs

JimKillock commented 5 years ago

The filters we have are great: could we have a final filter, for both fixed and mobile, but not BT Strict? This would just save time selecting the options one by one.

dantheta commented 5 years ago

I had one more idea too - "invert", to invert the selected filter. Using exclude was performance poison (due to not using an index and the creation of a massive temporary table), but having the frontend manipulate the list works really quickly.

JimKillock commented 5 years ago

I don't mind how it is done, but the logic has to reflect that we only exclude sites where BT Strict is the only block :)

edjw commented 5 years ago

@dantheta – What's the status of this issue Dan?

dantheta commented 5 years ago

Excluding BT-Strict is possible on each of the lists pages (like https://www.blocked.org.uk/list/2019_report_all_lqbtq?status=&exclude=1&network=BT-Strict).

Excluding ISPs in keyword search isn't currently exposed in the frontend, but it is pretty much the same piece of work as openrightsgroup/blocked-org-uk#383.

dantheta commented 5 years ago

The search index has been cleared of entries which are blocked on bt-strict only, and is now updating to the new format which includes TLD.

next-site suggestions based on category should be clear of BT-Strict only as well, but we'll have to see how well that's working with a few tests.