Censorship resistance on IPFS

dmbb commented 6 years ago

Hey everyone!

I'm Diogo and I'm pursuing a PhD in Instituto Superior Técnico, Universidade de Lisboa. From my MSc. onwards, I've been dwelling with the topic of Internet censorship circumvention. Particularly, I've explored ways to transmit censored data by piggybacking it on top of multimedia protocols which a censor may refrain to block due to social/economical reasons.

I take interest in IPFS as it allows data to be replicated around the network, difficulting the task of censors to block a given piece of information. In specific, it got to my attention that IPFS was sucessfully used during the referedum in Catalonia in order to prevent the spanish government from blocking voting-related information from citizens.

Although users were able to browse data in an uncensored way in the above episode, in my understanding, there are other challenges facing the adoption of IPFS for censorship-resistance purposes. For instance, IPFS's bootstrapping process is either tied to a set of well know nodes which could be blocked by a knowledgeable censor, or to the use of a peer discovery protocol which may be identified (and further blocked) by a censor's traffic analysis techniques.

Indeed, such an issue is also true for other overlay networks such as Tor. I'm opening this issue to get to know your opinion on the major research challenges IPFS faces in order to provide Internet censorship resistance capabilities. Are these challenges similar to the ones faced by Tor? Are there any disparate design decisions which sprout fundamentally different approaches?

Thanks for building such an amazing project. I thank you in advance for any comments you may have about directions for fighting censorship with IPFS.

Stebalien commented 6 years ago

A lot of the challenges concerning censorship will look a lot like the ones faced by Tor. However, we have some advantages and disadvantages.

Advantages:

Not a one-trick pony. That is, it's not only for privacy/anti-censorship. If IPFS can gain a critical mass, it'll be hard to block it entirely without economic repercussions (especially if people start doing software distribution over IPFS).
The ability to bootstrap off of other nodes on the same LAN using local discovery. Currently, two nodes on the same LAN can (well, should, this area needs a bit of love) find each other and connect even if one or both can't connect to the rest of the network. IPFS can even operate in entirely isolated networks this way.

Disadvantages:

No privacy/anonymity. This is a pretty big issue where censorship is concerned.
Readily enumerable. In Tor, it's possible to enumerate all the relays. In IPFS, it's possible to enumerate all known nodes (without much difficulty). Unfortunately, IPFS needs this to function properly.

However, we're always working on improving IPFS. We're working on:

A tor transport (eventually). However, we want to be very sure that we're doing everything right before we finish/release this feature. Unfortunately, this will negate the two advantages we have over Tor.
QUIC support with TLS 1.3. Due to how TLS 1.3/QUIC work, traffic between IPFS nodes using this transport should be hard to distinguish from HTTPs traffic (although we may need to randomize the port to make this an effective anti-censorship measure). Also, unfortunately, the IPFS node will still respond as an IPFS node so this only really helps prevent passive attackers.

Kubuxu commented 6 years ago

For instance, IPFS's bootstrapping process is either tied to a set of well know nodes which could be blocked by a knowledgeable censor, or to the use of a peer discovery protocol which may be identified (and further blocked) by a censor's traffic analysis techniques.

Bootstrapping process also be significantly improved by using previously known nodes for bootstrapping. The primary risk there is that at least one of the nodes used for bootstrap needs to be well behaved (and give you access to the rest of the network through DHT based discovery).

IPFS has also great properties in case of hybrid sneakernets. Use Sneakernets to get data into the network (imagine campus network) and then use ipfs internally to access, duplicate and generally spread the data (and local bootstrap peers if local discovery doesn't work).

Disadvantages mentioned by @Stebalien still apply.

dmbb commented 6 years ago

Thank you so much for your answers. In fact, and since you have mentioned it, transport layer is another thing I'm intrigued about. What exactly does IPFS traffic looks like in the network, at the moment? Is there a default transport for IPFS connections?

QUIC support with TLS 1.3. Due to how TLS 1.3/QUIC work, traffic between IPFS nodes using this transport should be hard to distinguish from HTTPs traffic (although we may need to randomize the port to make this an effective anti-censorship measure). Also, unfortunately, the IPFS node will still respond as an IPFS node so this only really helps prevent passive attackers.

@Stebalien even while using TLS 1.3/QUIC, I'm assuming no further effort is employed in obfuscating traffic patterns. Lets say a client wishes to download some file. How easy would it be for a passive adversary to fingerprint downloaded content when data is downloaded simultaneously from multiple peers? It looks this kind of analysis would be harder to perform in IPFS than in Tor, for instance.

Also, if TLS/QUIC gets to be deployed, wouldn't it make more sense to use some well-known port like 443 to prevent a censor from blindly blocking TLS-alike traffic in uncommon ports? What's the rationale for using this transport over random ports?

okdistribute commented 6 years ago

Where's the evidence that it was used in Catalonia? From what I understood, domains were blocked, so they had to move and rehost the website on a different domain almost every day.

dmbb commented 6 years ago

@karissa I found this discussion highlighted in HackerNews, concerning the following article. This was also discussed in Twitter.

If you happen to have more concrete information about this episode, I'd be happy to learn. I think websites were also constantly rehosted in different domains besides being online through IPFS.

ghost commented 6 years ago

@karissa They built the official voting info website in a completely static fashion, which made it easy to distribute it with p2p technologies. They then had the president tweet out the URL after the canonical website was blocked: https://twitter.com/KRLS/status/911482634789953536

krls

That tweet is what got gateway.ipfs.io blocked on the next day, while funny enough ipfs.io continued to work.

This obviously still went through HTTP, but it's trivial to replace https://ipfs.io/ipfs with ipfs://, and ipfs-companion helps with it.

(Sorry about those ugly lines in the screenshot, that's my crappy screen grab tool ;)).

okdistribute commented 6 years ago

Gotcha, I was there through the week of Oct 1 and no one seemed to be using the ipfs client. They had a huge whatsapp group and a new link would get sent out every time the old website got banned. I bet the ipfs link worked for one of those rounds though. Pretty cool.

Stebalien commented 6 years ago

Is there a default transport for IPFS connections?

Yes. We currently use a TLS-like protocol we call secio (but are working on switching to plain TLS). However, we have to negotiate the security protocol in the clear so IPFS connections are currently readily identifiable.

even while using TLS 1.3/QUIC, I'm assuming no further effort is employed in obfuscating traffic patterns. Lets say a client wishes to download some file. How easy would it be for a passive adversary to fingerprint downloaded content when data is downloaded simultaneously from multiple peers? It looks this kind of analysis would be harder to perform in IPFS than in Tor, for instance.

Unlike Tor, passive adversaries learn significantly less about what a user might be downloading because IPFS uses content addressing instead of location addressing. That means that where a user goes to download information is decoupled from what the user is downloading. However, it's still correlated.

Unfortunately, it's trivial for an active adversary to learn information like this. All they have to do is connect to a node and wait for it to ask it for a file.

Also, if TLS/QUIC gets to be deployed, wouldn't it make more sense to use some well-known port like 443 to prevent a censor from blindly blocking TLS-alike traffic in uncommon ports? What's the rationale for using this transport over random ports?

So, the problem with 443 is that it's a reserved port so users will have to run the daemon as root (not recommended). The next best thing is a random port (or, maybe, some common HTTP alternative port like 8080, 8181, 8888, etc.).

mitra42 commented 6 years ago

If the question is avoiding censorship rather than its companion (avoiding surveillance) then I'm more concerned about the single-point-of-failure issues. With the Catalonia example it was trivially easy to block https://ipfs.io which essentially meant that anyone without extreme tech skills couldn't access it.

I'm assuming that most people are going to be using unmodified browsers. We are building a version of our front-end to run in the browser (loadable from anywhere) BUT the connections are still single-point-of-failure, i.e. websocketstar which has to go direct to a known gateway server because that server currently has to be primed (e.g. via a HTTP HEAD call) to know about the file. This makes it easily blockable.

Its unclear to me whether there are fixes in the works for that problem (e.g. what I think is called websocket-relay?)

If I understand it correctly, part of the issue is that putting a DHT in the browser requires WebRTC, which crashes browsers when they open lots of conenctions. It would be great if Firefox & Chrome fixed that issue, but it doesn't sound like it. I've also been unable to get a clear answer as to whether WebRTC and the DHT built upon it could be tuned to open far fewer connections. IMHO even if a browser opened 10 connections it could create a workable DHT since there would still be a lot of well-connected (Go or NodeJS) nodes.

Is there anyone on this thread whose been thinking about those single-point-of-failure issues ?

RangerMauve commented 6 years ago

@mitra42 The websocket-star example is going to be fixed once https://github.com/libp2p/js-libp2p-websocket-star/pull/43 lands.

Relay mode in general is going to fix issues with browsers not being able to participate in the network. But this will only work if there's a large amount of relay nodes out there and that it will be easy to discover them. (Would be nice if relay hop was on by default)

RangerMauve commented 6 years ago

Some more single points of failure are going to be the bootstrap nodes for the DHT. I don't think any IPFS clients cache healthy nodes for the DHT and currently rely on the bootstrap nodes and mdns to find peers.

Though I don't think that will be hard to fix.

mitra42 commented 6 years ago

That will be great when a browser can connect to any node (or any of a certain subset of nodes) and access IPFS docs uploaded to any other node, I think that's what most people expect (and I've seen a number of reported "bugs" which appear to be just not understanding that you can't access any IPFS file from anywhere). Once you have that, then it becomes easy to surface a large number of connection points and avoid single points of failure. I'm also betting we can set up ways to distribute lists of bootstrap nodes in applications an so on. DOesnt sound like we can do much till that patch lands.

hadifarnoud commented 6 years ago

is it possible to access IPFS files with another domain? governments can block access to ipfs.io domain and therefore block the whole thing.

mitra42 commented 6 years ago

As far as I'm aware there are two cases a) access to IPFS files with the IPFS protocol eg. from JS-IPFS running in an application on your browser b) access to IPFS files thru an HTTP/IPFS gateway.

The IPFS protocol still has single points of failure/censorship, though it would be good to get an update on the status since some of those reported above may have been fixed.

In theory anyone could setup a HTTP/IPFS gateway and provide access to any files, and that might work in many circumstances. but ... 1: That gateway itself could be blocked if it became widely known, and if it isn't widely known then its hard to have effect 2: There were definitely problems we hit with entering files at one location and retrieving through a gateway elsewhere to do with that gateway being able to find them, some of those problems might have been fixed by now and some might be peculiar to the Archive's particular setup (using an early version of urlstore that wasn't announcing to the DHT and scaling issues in the DHT).

There is also interesting information on some of the links I'm seeing above,

Stebalien commented 6 years ago

The IPFS protocol still has single points of failure/censorship, though it would be good to get an update on the status since some of those reported above may have been fixed.

Currently, if you can't connect to any bootstrap nodes (or nodes on your local network advertised over mDNS), you won't be able to join the network. However, you can add custom bootstrap nodes. We're also working on persisting peer-store information which will allow us to try to connect to nodes we've seen previously.

access to IPFS files thru an HTTP/IPFS gateway.

We also have a gateway that uses a javascript service worker here: https://js.ipfs.io/ (scroll down). Once enabled, you'll be able to visit, e.g., https://js.ipfs.io/ipfs/QmYNQJoKGNHTpPxCBPh9KkDpaExgd2duMa3aF6ytMpHdao/index.html, and load it through js-ipfs.

We're also working with some people at Mozilla on better browser integration (see libdweb and ipfs-companion.

You can currently install the ipfs-companion and enable the "js-ipfs" internal node to use IPFS without installing any local applications and without relying on any public gateways. Once we can get the libdweb APIs merged into Firefox itself, you'll even be able to visit addresses like ipfs://... or ipns://ipfs.io.

TheZ4ro commented 5 years ago

I think that although TLS 1.3 and QUIC can be confused as a standard access protocol, they still cannot avoid the outcome that can be easily distinguished by the latest AI data analysis. Whether it is DHT or TLS, it is not difficult to identify (especially data traffic feature identification). This part of the work has been mentioned in several recent patents from China. It turns out that the most effective way is not to perform reliable extreme encryption, but to obfuscate the data stream. In simple terms, we can easily distinguish masked people, and masked people are more likely to attract attention. But if you do a facelift, the likelihood of being noticed will be significantly reduced.I have tried disguising the data stream as a public and obvious video stream, especially with a standard decodable video before the session. In this case, basically all monitoring can be easily broken.

bertrandfalguiere commented 5 years ago

Some ideas on the "let bootstrapping not be a single point of failure" front: https://github.com/ipfs/go-ipfs/issues/3908#issuecomment-330046410

jbshirk commented 5 years ago

I don't understand why personal information was encrypted and stored in the static site to begin with. All that should be needed is an SPA accessing a database of the hashes of ID number + birth date + post code, and the code to reproduce the hash locally in the browser.

This is the same issue as never storing login passwords (encrypted or not) on a server -only the hash of the password. Either a user can reproduce the hash, or cannot.

Am i missing something?

On Thu, Nov 28, 2019, 4:54 AM bertrandfalguiere notifications@github.com wrote:

Some ideas on the "let bootstrapping not be a single point of failure" front: ipfs/go-ipfs#3908 (comment) https://github.com/ipfs/go-ipfs/issues/3908#issuecomment-330046410

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ipfs/notes/issues/281?email_source=notifications&email_token=AA6EGVEEDIMQUXO6VKKAH7TQV6INTA5CNFSM4ESV4DEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFMCBEQ#issuecomment-559423634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6EGVFFK7473CXIJYEVIC3QV6INTANCNFSM4ESV4DEA .

ipfs / notes

Censorship resistance on IPFS #281