Censorship resistant bootstrapping (e.g. for wikipedia)

ianopolous commented 7 years ago

Version information:

N/A

Type:

Enhancement

Severity:

Medium (unless you're in Turkey, then High)

Description:

I was thinking about the attack vectors for censorship of the recently hosted wikipedia in Turkey, and I believe a significant weak point is the bootstrap process. Currently it is a hardcoded (in a config file) list of domains/ips. This is public and easy for an oppressor to add to a blacklist.

One proposed mitigation to this would be to have a fallback bootstrap method which used Tor. Tor have thought a lot more about attacks in this area, and using them would be easy. The simplest would be a Tor client that just contacts one of the bootstrap nodes through Tor to then bootstrap via. Clearly this is only as strong as the Tor bootstrapping mechanism, but as mentioned above that is a well studied problem.

This would mean that a binary of ipfs that was distributed in Turkey through USB sticks would still work even if ipfs.io and all the public ipfs bootstrap nodes were blacklisted.

Kubuxu commented 7 years ago

Ides which we had was to use already known nodes for "after the first" launches. Also we could ship IPFS with hundreds of nodes weight sorted according to some criteria and use both of those mechanisms at the same time.

ianopolous commented 7 years ago

@Kubuxu I thought that once a node had bootstrapped once it didn't need the bootstrap nodes again? Does it use them every time it restarts?

ianopolous commented 7 years ago

I think if they are going to blacklist your bootstrap node list, then it doesn't matter how long the list is.

Kubuxu commented 7 years ago

Unfortunately, currently yes.

ianopolous commented 7 years ago

Ok then that's an independent problem to address, but I would say that is a lower priority than a proper Tor based fix (which would solve both problems for many threat models).

Kubuxu commented 7 years ago

How does the TOR bootstrap? Wouldn't it face similar problems?

We for sure can run TOR hidden service with ever updating list of nodes one could bootstrap off.

ianopolous commented 7 years ago

Tor does face similar problems, but they have spent a long time trying to solve them, for example using unpublished bridges rather than the public directory servers.

ianopolous commented 7 years ago

Note I wasn't suggesting running a Tor hidden service in the simplest case, just using Tor to access a public ipfs bootstrap node.

djdv commented 7 years ago

@Kubuxu

Also we could ship IPFS with hundreds of nodes weight sorted according to some criteria and use both of those mechanisms at the same time.

Would something like a dynamic list be useful, where it's populated with nodes sorted by daemon uptime (as well as adding some randomness to the selection process)? For instance a list of nodes that's generated/published to ipns every few hours. It might help against static blacklists that are manually updated, but I don't know how common that kind of setup is, so it may be pointless.

In addition there's still the matter of connecting to grab said list for the first time, either through IPNS or something else. The only connections you could really rely on communicating with would have to be things something like mDNS would pick up on, physically close ad-hoc networks used for bootstrapping, then you could either do peer exchanging or maybe message relaying. I've got no idea on that aspect.

matthewrobertbell commented 7 years ago

A possible option would be to take inspiration from how botnets work: Use a dynamic set of domains / subdomains (changing over time) to publish a list of bootstrap nodes which change over time, either via HTTP or DNS TXT records.

ghost commented 7 years ago

Another idea is domain fronting it e.g. with google cloud: https://github.com/libp2p/libp2p/issues/18

elitak commented 7 years ago

I suggest adding cmdlets for as many methods to circumvent that can be imaginied, e.g.:

ipfs bootstrap scrape raw file:///mnt/usb/bootstrap-list.txt # provide support for as many URI schemes as possible, including ssh://, magnet:
ipfs bootstrap scrape tor # grabs a list from wellknown1.onion, wellknown2.onion and adds it to bootstrap list, via the http proxy hooked up to tor on localhost port NNNN(changeable in config)
ipfs bootstrap scrape domain-front reputable-domain.com #sends some standardized request over https, requires complicity by the reputable-domain
ipfs bootstrap scrape dns fast-or-double-flux-bootstrapper.com
ipfs bootstrap scrape irc irc.efnet.net ipfs-bootstrap # talks to a bot on that server+channel in a predefined way
ipfs bootstrap scrape twitter @username # reads tweets in rev-chrono order on this account and interprets them as URIs to dial until N successes
ipfs bootstrap scrape bittorrent-dht # use bittorrent DHT to find potential endpoints; i.e., ipfs running on same addresses. That subset is selected by fetching a specific torrent that signifies the host also acts as an ipfs bootstrap node
ipfs bootstrap scrape netscan # last hope; just dials IPv4/6 addresses randomly on port 4001 until it hits something

Some of these are trivial to implement as bash scripts. In Go, it would take (me, at least) a bit more effort, but each could be written conforming to a simple plugin API, then static-linked in along with whatever meta-bootstrap data are needed.

kpcyrd commented 7 years ago

@elitak that might integrate nicely. Given the command ipfs bootstrap scrape foo arg1 --arg=2, ipfs could try to execute ipfs-bootstrap-scrape-foo-fetch arg1 --arg=2 and read addresses from it's output. This way you can write plugins for bootstrapping in any language, similar to how git works. Preferably the command would be shorter.

As an alternative, I think you can always add nodes using the ipfs api with a project seperate from ipfs.

whyrusleeping commented 7 years ago

ipfs does have a method for allowing git-style pluggable programs as subcommands (ipfs update works this way). Its currently whitelisted so ipfs has to at least know it should try searching for a given external subcommand before it will work.

elitak commented 6 years ago

Much as I'm tempted to hack away at adding all these as bash scripts, I think it'd be foolhardy not to implement them instead in Go, so as to carry forward the static-linked, cross-platform portability that's already afforded, especially so for utilities whose audience may not be very tech-savvy to begin with.

I'll have a stab at adding some basic ones, but no promises on how soon.

elitak commented 6 years ago

To expand on using bittorrent's DHT: the hypothetical trick I came up with would be to compute the infohash (.torrent without the header) for a file generated locally, containing something like "0.1.0;ip4;4001"(no newline, could also be JSON), to lookup all daemons running v0.1.0 protocol on ipv4 addresses on port 4001. Using (preferably) a built-in minimal bittorrent client, or an external one, the ipfs daemon would obtain a list of addresses for peers distributing that hash on the bittorrent DHT network, and assume that each peer that did was doing so was thus advertising an ipfs daemon matching the criteria, hosted at the same IP4/6 address. Any outsider could include his daemon in the bootstrap list by simply seeding the appropriate file in a DHT-enabled bittorrent cilent running on the same machine.

The greatest weaknesses I see with this method are that ports need to be guessed (4001 probably being the only decent candidate) and the high chance that the bittorrent DHT bootstrap nodes are blocked along with the ipfs ones. The latter could be offset by also scraping the hash from common public trackers (http and https, udp), obtaining similar results, in complete absence of DHT connectivity. DHT-type networks from other p2p apps could be utilized in the same fashion, probably.

raulk commented 5 years ago

Ideas being discussed in https://github.com/libp2p/go-libp2p-kad-dht/issues/254.

Jorropo commented 1 year ago

This was done in https://github.com/ipfs/kubo/pull/8856

ianopolous commented 1 year ago

I don't think #8856 solves this. That solves the secondary problem of subsequent restarts. The original problem of initial bootstrap still is unsolved.

Jorropo commented 1 year ago

I don't think there is a not over complicated good solution to that. If we let's say capture 100 nodes while doing every release and store them in the binary someone could download each release and ban the 100 nodes everytime. It's more work for them but does not really solve the problem. I still want to do something like this but as a protection if our bootstrapers are down.

Else "forum based bootstrapping" where you ask someone to give you 100 random nodes and add them to your bootstrap list is the only way to solve the initial boot without over engineering a solution that is just gonna put us in the treadmill problem.

ianopolous commented 1 year ago

Agreed, it is a hard problem.

There are some options that work for varying threat models:

Using another network like Tor or I2P to contact existing bootstrappers
Use domain fronting where it still works
Investigate what Tor's meek-azure mode is doing

ipfs / kubo