Implement Allow Lists and Block Lists

flyingzumwalt commented 6 years ago

Also known as whitelists, blacklists, blocklists, deny lists, etc.

For many reasons, it's important for people operating IPFS nodes to have the option of specifying which content they will, or won't, provide to the network. It's been requested and discussed many times since the inception of IPFS. This basic feature brings up a whole domain of important opportunities and risks, but the requirements for an initial implementation are relatively simple.

Basic notes:

When functioning as a gateway, nodes need to be able to specify which content they refuse to resolve (block list). They should also have the option of only resolving a specific list of hashes (allow list)
When functioning as a regular ipfs node (not a gateway), nodes need to be able to specify which hashes they won't hold in their hash tables (ie. refuse to tell people where to get the content).
Related: it should be easy for a node to purge a list of hashes (unpin and garbage collect), to cover cases where they might have accidentally cached malware, etc. and want to clean it out based on a list of harmful content that the node's operator has obtained.

Related Issues

[ ] https://github.com/ipfs/go-ipfs/issues/1070 gateway limitations
[ ] https://github.com/ipfs/js-ipfs/issues/2152

Related threads on discuss.ipfs.io:

Mr0grog commented 6 years ago

Is how to share those lists reasonable to discuss here, too? (e.g. you should be able to specify an IPNS path that represents an allow/block list.) Or is that too broad in scope for now?

flyingzumwalt commented 6 years ago

I think that (how to share those lists) should be out of scope. The whole purpose of IPFS is to make it easy to pass around data. allow/block lists are just datasets. The missing piece is the ability to have your nodes automatically apply or respect those lists.

I agree that in the long run people will want/need a whole social and technical infrastructure for passing these datasets around, but I think it should be handled as a separate longer-term design exercise with many, many factors in the mix.

mnp commented 6 years ago

Is there a good place to discuss the larger vision, ethics, and mission, and how specific features contribute to, or detract from those? Is there a repo somewhere? This issue concerns me because it opens the door to censorship, whether voluntary, geographic, business, political, or governmental.

Network segmentation. Suppose a dictator asks nodes in his country to honor a blacklist of content critical to him, and they comply. Now there are two kinds of nodes in the system, those that handle that content and those that don't. At some point, do you have two disjoint networks? Is that okay? Is it conceivable that there would be scores of networks which would not interact?
Archival risk. Most troubling, is there a problem that blacklisted items, if honored by nodes widely enough, will disappear altogether? For example, it's hard enough finding deleted tweets right now: Twitter forbids mirrors, so if Trump for example wants to remove something he said last week, it's gone and there's no copy to prove he said it.
Effectiveness. Is a list scheme even feasible? If you add a padding byte to the end of an objectionable file (for example) or flip a bit in the middle of a video, etc, then you have a new hash but the content might be close enough for humans. Anyone distributing and wishing to circumvent a list can do this all day.
Technical deniability. Before implementing lists, node operators can honestly say there is no way to stop an objectionable content item on demand. After lists, this (maybe thin) barrier is gone, and once it's been compelled once, the door is open and there might be a mountain of constant demands. Eg Youtube and DMCA take-down notices.
Legal compliance. How will node operators evaluate those lists they must implement and those they need not implement? There are already competing and conflicting geographic legal jurisdictions. One might envision other additional divisions later, such as virtual countries. It's hard enough figuring out GDPR compliance, interstate and international sales tax compliance, copyright, patent and trademark compliance... on and on. More generally, human constructs will tend to intrude more into technical constructs with time; we've seen this many times and it often spoils the party.
Necessity. As a counter to all of these points, what if there does arise a human concensus that a data item is truly harmful and must be blocked or removed? How would such a consensus be reached and how would it be implemented without tripping the rest of these points?

Stebalien commented 6 years ago

Is there a good place to discuss the larger vision, ethics, and mission, and how specific features contribute to, or detract from those?

This is probably the best place to discuss it.

Network segmentation.

Unless otherwise specified, the nodes will still interact. It's just that some of them will not download certain content.

Archival risk.

This is a concern but someone is probably going to archive it anyways (if it has some historical, cultural, or political significance), even if they don't distribute it.

Twitter forbids mirrors

Twitter is a centralized service so it's not really a good example when discussing IPFS. Any decentralized twitter replacement wouldn't be able to forbid mirrors. Really, "mirror" implies that there is a canonical "original". In a decentralized service, there are no "mirrors".

Effectiveness.

A hash list is a good place to start but you are correct, it won't stand up to a sufficiently motivated individual. Really, with encryption and steganography, nothing will. Over time, some governments/organizations will likely mandate filters that look at the content itself.

Technical deniability.

We already get DMCA takedown notices against our gateways. One can get "deniability" with, e.g., the freenet approach however, that will likely get IPFS banned in many places (and deniability may or may not be an effective legal argument). An alternative approach is to use IPFS with tor. While we don't currently support or recommend it (we haven't done an audit and there are probably innumerable ways to deanonymize nodes), there is a tor transport for IPFS.

Legal compliance.

For companies, they'll likely do what they do today: hire lawyers. In this respect, IPFS is no different than other internet technologies.

End users will also what they do today: obey the laws in their current legal jurisdiction or work around them using anonymizing/privacy/circumvention technologies.

Necessity.

The goal here isn't to achieve consensus. The goal is to provide users with a tool that allows them to deal with their current legal realities. Not blocking certain content in certain jurisdictions could get you fined, arrested, or killed.

mnp commented 6 years ago

See also sibling note https://github.com/ipfs/notes/issues/281 on censorship resistance.

pkafei commented 5 years ago

Censorship resistance is an issue that we should definitely explore if we want to increase our presence in certain parts of the world where the internet is "curated". In the long run, does IPFS want to solve the pain point of limited access to content on the web?

olizilla commented 5 years ago

Adding to an allow list would be more convenient if was recursive by default, as you probably don't mean "i shall allow just this specific block of metadata", and probably mean, this dataset / this directory, this whole file.

Adding a CID to a deny list would possibly need to be specific, and non-recursive by default, otherwise bad-bits could deny users the ability to host other legitmate content that happened to include an identical block somehere in the graph (you dont want to block jquery because a malicious site uses it), and to discover the full graph would require you to find all the bad-bits.

What this might look like:

# Allow bafyb123 and all blocks in the graph
$ ipfs allow bafyb123
$ ipfs allow ls
bafyb123 (recursive)

# Do not host bafyb123 specifically.
$ ipfs deny bafyb123
$ ipfs deny ls
bafyb123

There will be occasions where you want to be very sure not to host any of the content, but that should be a list of directly named blocks. We could provide a recursive flag, but you'd probably want to ensure that we only check currently stored blocks rather than asking the network for them in order to block them.

Perhaps you should be able to create allow list entries when you add a file, and deny list entries as you delete blocks

$ ipfs add --host-only-this ./webroot

$ ipfs block rm --do-not-host bafyx

Stebalien commented 5 years ago

Adding to an allow list would be more convenient if was recursive by default, as you probably don't mean "i shall allow just this specific block of metadata", and probably mean, this dataset / this directory, this whole file.

WRT "allow lists", go-ipfs now has a Gateway.NoFetch config flag that configures a gateway to fetch nothing from the network. This means only pinned (and otherwise cached) files will be served.

olizilla commented 5 years ago

@ianjdarrow has written up a proposal for a content-decider interface here: https://github.com/ianjdarrow/content-decider

lidel commented 2 years ago

Update: Cloudflare proposed (allow/deny)list format in https://github.com/ipfs/specs/pull/299

ipfs / notes

Implement Allow Lists and Block Lists #284