openrightsgroup / blocked-org-uk

Template front-end code, markup, style-sheets, images and other assets for the Censorship Monitoring Project (blocked.org.uk)
https://www.blocked.org.uk/
GNU General Public License v3.0
13 stars 5 forks source link

Improved list of site categories #22

Closed graphiclunarkid closed 10 years ago

graphiclunarkid commented 10 years ago

We have a short list of fairly generic categories of site at the moment:

What other categories should we list? These need to be added.

RuthC commented 10 years ago

So this is from a drop menu for people to choose from when they submit a site? Maybe also: News Social Media

JimKillock commented 10 years ago

Will this produce useful info? Wouldn't it be better to specify categories that might help us understand why filters are misapplying to them? Eg

Blog (lots of these get blocked) Sex education Teenage Forum Erotica (non-porn) Alcohol Tobacco Campaign

+447894498127 https://www.openrightsgroup.org

On 22 May 2014, at 16:08, RuthC notifications@github.com wrote:

So this is from a drop menu for people to choose from when they submit a site? Maybe also: News Social Media

— Reply to this email directly or view it on GitHub.

RuthC commented 10 years ago

But doesn't that make an assumption about why the site is blocked that might be easy for the filtering companies to defend themselves from? I mean if a site is 'alcohol' then they can say 'yes it's for over 18s' I don't think we should be using the same categories as the filtering companies because that inherantly justifies categorising sites like that. We need to frame these sites in a different way, not just using their language. Tobacco is 'business' for instance.

webal commented 10 years ago

I think for the purposes of presented the stats it's going to be much more powerful to say that x businesses were censored as oppposed to x alcohol related sites were banned.

graphiclunarkid commented 10 years ago

I agree with @RuthC and @webal about the nature of the categories but I think we need a few more. Some practical examples from this week that I found difficult to categorise with the existing list:

Maybe to the existing list we could add:

And possibly "other" with a free-text box for people to enter their own?

webal commented 10 years ago

I guess there will always be some overlap, we could allow people to select multiple categories, but I don't know if that would make things more or less clear :/

ei8fdb commented 10 years ago

Is it possible for the blocked.org.uk to suggest a category for the URL?

On 22 May 2014, at 16:25, webal notifications@github.com wrote:

I guess there will always be some overlap, we could allow people to select multiple categories, but I don't know if that would make things more or less clear :/

JimKillock commented 10 years ago

OK, assuming that @RuthC and @webal seem to think this is to produce statistics about what incorrect blocks affect, then I think a number of the suggestions I made are very important. I don't think this is about framing, this is about documenting harm as accurately as possible.

For instance, I found it very useful to point out that alcohol-related sites are being blocked: nobody seriously thinks teenagers might visit a pub as the result of the presence of a website. Nor are they drinking over the Internet.

Sex education blocks is very important to know about. Campaign sites are vital to know if they are blocked. Teenage sites would be a harm to those who are supposed to be helped. Forums are disgracefully blocked, for no good reason, and are a "community harm".

So most of the categories, to me, should be there and don't cause a framing problem. Tobacco is a potential exception, and I'm happy to lose it. So just to resuggest:

Blog (lots of these get blocked) Sex education Teenage Forum Erotica (non-porn) Alcohol Tobacco Campaign

JimKillock commented 10 years ago

One other suggestion: this is something where Javier ought to be asked, in case he has a particular need for data.

webal commented 10 years ago

Should there be an option to choose 'no reason to censor this' to highlight sites that should not have been blocked? This would allow ORG to keep a list of collateral damage that could be used to show the harm to people//companies.

I think that the categories we have kind of fall under two headings - the type of content owner (blog - persona;/business/charity/govt) and the content type (alcohol, porn, sex ed, etc). Perhaps having inputs for the two sets of categorisation might clarify this.

We could easily have multiple checkboxes to allow the users to 'tick all that apply' which may mean better categorisation, but might make analysis a little trickier as sites could overlap categories.

JimKillock commented 10 years ago

Yes, two questions is certainly an idea (i.e., who are you; what content do you have); but "no reason to censor" is very subjective. I don't see that alcohol should be censored, certainly sex education shouldn't be except maybe for u12s (the filters are generally u18). Porn isn't really an incorrect block and I would have thought is unlikely to be reported to us.

graphiclunarkid commented 10 years ago

There might be some benefit in asking submitters to say whether they consider results to be under- or overblocked. @JimKillock is right to point out this is subjective, however if one of our aims is to argue that distinguishing between "good" and "bad" websites is impossible, I think receiving a range of opinions about controversial URLs (or categories) would serve to illustrate that point. Sites that divide opinion might also prove worthy of closer examination so this would be one way of discovering them.

graphiclunarkid commented 10 years ago

@jimkillock We need to arrive at a decision swiftly so that any changes can be implemented in time for launch. I know Javier is very busy at the moment, but perhaps if you could speak with him, and then write up whatever you agree we could move forward?

Alternatively the three of us could talk by Skype and I can then write it up afterwards. I'm mostly free for the rest of the week.

JimKillock commented 10 years ago

Use two categories, as Javier suggested. Use one to categorise who they are, as per the present set up, and one more to categorise the kind of content (as above) so we get an idea why the site might be blocked.

I have one more request for the form which I will raise on another ticket.

graphiclunarkid commented 10 years ago

OpenDNS has a pretty good list of categories that could be used to categories both under- and overblocking. There are a couple of omissions that I would add: personal sites (static homepages not blogs); and education (wider than "educational institutions"; to include how-to sites, community learning, code clubs, etc.)

(Aside: OpenDSN don't seem ready to license this data for reuse or to expose an API yet: their terms of service prevent scripted access. We could contact them and ask permission though).

The Wikipedia article on UK web blocking has a list of categories curated from those used by home ISPs. It's less comprehensive than the OpenDNS list and is more useful for categorising underblocked rather than overblocked sites IMO.

graphiclunarkid commented 10 years ago

I have had a go at improving the site classification options. There are now three drop-down form elements:

You can see this in action on the http://stage.blocked.org.uk/ front page and the code is available in pull request #60. Please let me know what you think.

webal commented 10 years ago

It does seem a bit focussed on types of site more likely to be blocked (on the third drop down)

graphiclunarkid commented 10 years ago

@webal I tried to target the types of content ISPs are themselves targeting with their filters, but allowing for the possibility that they've got it wrong, which we're anticipating.

The trouble with trying to be more comprehensive is that we risk ending up with a list of all human activity - a long list! If we use broader categories to avoid exhaustive detail we risk the list becoming so vague as to be meaningless. "Entertainment" is already a meaninglessly large category, for example, but I left it there to cover media sites (film, music, books, etc) that might be blocked for copyright infringement.

It's good that the field is optional IMO - if none of the categories apply people can just leave it blank.

I'm very happy to entertain suggestions for changes to the content list in particular. Also the others if anyone thinks we could improve them.