openrightsgroup / blocked-org-uk

Template front-end code, markup, style-sheets, images and other assets for the Censorship Monitoring Project (blocked.org.uk)
https://www.blocked.org.uk/
GNU General Public License v3.0
13 stars 5 forks source link

Testing policy #316

Closed JimKillock closed 5 years ago

JimKillock commented 6 years ago

I've drafted a testing policy for UK and EU probes here:

https://wiki.openrightsgroup.org/wiki/Blocked.org.uk/Testing_policy

The idea is to:

So it would mean running incomplete tests on domains, to get started, rather than a full test each time.

Comments welcome.

JimKillock commented 6 years ago

@dantheta would it be worth a chat / IM / RocketChat discussion about this, or does it all make sense? I had the feeling it might need remodelling of how the test queue system works?

dantheta commented 6 years ago

It all makes sense - it doesn't require huge changes to the data model. It's mostly componentized with a set of 10-minutely jobs to run based on the source categories. The current version is here: https://github.com/openrightsgroup/Blocking-Middleware/blob/master/backend/requeue.php

dantheta commented 6 years ago

The control panel is up and running, and allows batches of URLs to be fed to a user-defined set of ISPs at a specified rate. There's still some further changes and documentation to write, but the basics are there.

JimKillock commented 6 years ago

Thanks, that's very helpful, I'll take a look.

JimKillock commented 6 years ago

Looks very helpful. Can we also edit or add to the source categories / URLs themselves?

dantheta commented 6 years ago

The source categories are generally assigned during import, but I'm planning to add that functionality to the URL admin screen. The URLs themselves aren't editable since they have historical results attached, but they can be deleted or marked invalid on the URL admin screen.

JimKillock commented 6 years ago

The main thing here is to be able to add URLs for testing, so we can expand the test lists for sensitive categories.

JimKillock commented 6 years ago

This would be a really handy thing to add soon ;)

dantheta commented 6 years ago

I've added a URL import screen to the admin panel, which allows the addition of new URLs along with tags. Where URLs already exist in the system, the new tags are merged onto the existing set.

There are still a couple of improvements to be made - namely that the list of tags is taken from static data, so new tags won't appear as checkboxes for the moment.

JimKillock commented 6 years ago

That's great Dan, seems to work well. I will get thinking about new data sources!

I have some questions about the testing policies in general. At the moment most of the policies are not visible in the admin panels. It would be good to move these across so we can see what they are.

I would also like to start testing on .com — this will take years to get through — so can we make that dataset visible in the admin?

I'd also like to check our policy on the .uk data, for instance whether that gets re-tested. We should consider how often we do that.

JimKillock commented 6 years ago

Quick feedback: I tried setting up a test routine for Charities (as a test) and Pirate sites (as I added some sites). Observations:

(1) Neither say how many tests are done, although it claims to be running the tests

(2) There is not information about the lists, in terms of size etc. It would be handy to be able to download lists, or view them somehow, so we can check what is there already

JimKillock commented 6 years ago

Further note, the Cloudflare tests seems to be stuck, unclear why.

dantheta commented 5 years ago

I made some improvements to this a while back - the running tasks have a progress bar on the overview page. I've also added a two-day deadline for queued jobs on each of the ISP queues, so that any disconnected line doesn't hold up the rest of the queues. It does mean that there may be gaps in coverage if a line goes dead for an extended period of time, but only on that one line.

JimKillock commented 5 years ago

This is all good and very helpful. I've added a ticket to make some or all of the information public on the stats page. Next up would be to add the CommonCrawl and other data into the test scheduler. Also, is the .org dataset available and tested?

JimKillock commented 5 years ago

PS I tried adding a new test case for TPB clones but it threw an error.

dantheta commented 5 years ago

The error was thrown because the submission interval (in the rate tab) wasn't filled in. I've added a sensible default for that. Many apologies!

.org is was processed a while back, and is viewable on the statistics page. We can add a test case to re-process those.