distributed bootstrapping with IPNS

RubenKelevra commented 4 years ago

Current bootstrap approach

We have a config file that contains either addresses or DNS names with public keys.

We use DNS to resolve addresses, then we connect to the fixed list of predefined nodes with known public keys and ask them for (somewhat) random nodes of the network, to spin up the DHT of the node.

The nodes are run by a trusted entity and we do this on each startup.

The problem

There are several problems with this approach:

It consumes many resources on the trusted entity side because all nodes do this on every reboot/restart.
The bootstrap node definition isn't dynamic, the default config will always contain the same entries and cannot be updated over the network (without updating the software).
The bootstrapping isn't distributed, there is only one trusted entity.
it's a centralized structure and prone to
- blocking
- outages
- DDoS attacks

I think we should do something about that (as discussed in other places, like https://github.com/libp2p/go-libp2p-kad-dht/issues/574).

Concept

I propose to change this very basic approach with a trust-network. It allows the users to configure their trust for projects, orgs and individuals. Thus allowing to do decisions based on those data how reliably received informations are.

Trust-DB

The trust-db is stored in the node-database, not in the config file and the bootstrap field in the config file will be ignored.

We define a JSON format to list nodes, with either domain names or addresses, public keys etc.
Those JSON files can be published with an IPNS record.
Other nodes can add these IPNS records to their storage and set a level how much they trust this entity.
The IPNS records will be refreshed and the new JSON files will be fetched and processed - when the TTL does expire.
By default we import a list of standard nodes, like we peers to the bootstrap list, with an IPNS public key to update it, as soon as the node is connected to the network.

This allows us to:

define trusted entities, like companies, projects, or individuals
trusted entities can update their list of nodes, as soon as they install new ones or remove old ones

Additionally trust levels

After implementing a basic trust level for bootstrapping, we could extend this to support a multitude of functions, using the trust level as ACLs.

Trust entries

Each trust-entry is a IPNS record which points to a trust-file, a trust level and a list of trusted functions as well as text-label and a lifetime. There are also some fields for the status in the end.

An example of a trust entry:

ID	description
label	name of this trust-db entry
pubkey	IPNS record
trustlevel	the trust level of this entry
trusted-functions	array of allowed extended functionalities
trust-lifetime	the time the trust-level validation is valid (="infinity")
fetched	timestamp when this entry was fetched
TTL	time after which this entry is considered stale
cache-lifetime	time after which the cached entry isn't valid anymore

trust functions

is a list of allowed functions, for this specific trust entry which overwrites positively or negatively the standard function matrix.

Trust levels

ID	example
peer	nil trust
marginal	no toughly validation, but a bit more trust than nil
trusted	an organization/individual the user trusts
advanced	e.g. a close friend the user trusts
ultimate	user's own trust file

`peer`

can be used to save a known peer with a remote or dns entry to the node information.

Trust level matrix

Trust level function matrix (as an example):

Function \ Trustlevel	`peer`	`marginal`	`trusted`	`advanced`	`ultimate`
use for bootstrap	🗶	🗶	✔	✔	✔
connect on startup	🗶	🗶	🗶	✔	✔
hold connection and reconnect to	🗶	🗶	🗶	✔	✔
used for autonat detection	🗶	✔	✔	✔	✔
allow graphsync	🗶	✔	✔	✔	✔
allow to query all ipns	🗶	🗶	🗶	🗶	✔
trusted peer exchange w/ ratings	🗶	🗶	✔	✔	✔
offer relay	🗶	🗶	🗶	✔	✔
use as relay	🗶	✔	✔	✔	✔
allow redistribute ipns	🗶	🗶	🗶	✔	✔
remote resolve IPNS	🗶	🗶	🗶	✔	✔
remote put DHT	🗶	🗶	🗶	✔	✔
remote fetch DHT	🗶	🗶	🗶	✔	✔
remote fetch CID	🗶	🗶	🗶	✔	✔

remote fetch

would allow thin clients, like mobile phones, to connect to other nodes and without bootstrapping. Thus allowing to query the DHT and fetch CIDs using them as a proxy. This would reduce the time-to-first-byte massively while reducing the energy consumption since this requires only a single connection that doesn't transfer any data when not actively used.

The thin clients would use and announce a relay connection to receive incoming connections if they are behind a firewall/nat.

remote put DHT

would allow a thin-client to announce content it holds to the DHT, without bootstrapping the DHT.

Trust files

The trust-files are json files with the following fields:

ID	subfield of	type	description	mandatory
description	root	text	field for descriptions	✔
contact	root	array	URIs for contacting	✔
entities	root	array	list of persons/projects/etc.	✔ (="default")
nodes	entities	array	entries for nodes	✔
node	nodes	array	identifiers (label) for node	✔ (="node")
pubkey	node	pubkey of the node	✔
contact	node	array	URIs for contacting	🗶
remotes	node	array	ip/port/protocol etc string	🗶
remotes-strict	node	boolean	should other remotes be omitted for this pubkey?	🗶 (=true if not present)

IPNS record limitations

for the IPNS records there some limitation necessary to avoid malicious entries

minimal TTL 24h
maximal cache time 2y

bertrandfalguiere commented 4 years ago

I believe the known peers will soon be remembered accross restarts. Additionnaly, in your system, you need to know some peers to fetch the IPNS records of the list of peers to bootstrap. So you need a bootstrap mechanism to bootstrap. Or will these lists be fetched out-of-band? Am i missing something?

RubenKelevra commented 4 years ago

I believe the known peers will soon be remembered across restarts.

I know, but this only works for somewhat short periods of downtime reliably. You still need to bootstrap after a longer period of downtime, since your random "known" peers are probably not reachable anymore.

The trustlevel: peer is basically just a way to permanently receive updates for IPs/domain names of peers you're likely going to use.

Say you often receive data from a cluster, then the cluster maintainer could provide such a file and you would add it with trust level: peers. This way you don't have to use the DHT to resolve the peers.

Additionally, in your system, you need to know some peers to fetch the IPNS records of the list of peers to bootstrap. So you need a bootstrap mechanism to bootstrap. Or will these lists be fetched out-of-band?

Nope you don't - the IPNS records would be used to fetch updates and the initial data.

The data behind them would remain permanently in the database until the IPNS record expires and haven't been refreshed. So as long as you have some non-expired IPNS records in the storage, you have public keys/dns names/ips in the trust-db for bootstrapping.

Like the current config file, the binary would provide a IPNS-record and some peers for the initial bootstrap after running --init:

By default we import a list of standard nodes, like we peers to the bootstrap list, with an IPNS public key to update it, as soon as the node is connected to the network.

RubenKelevra commented 4 years ago

@achingbrain I've asked about proxing since this requires some ACLs in the node. Here's my feature request for this :)

Stebalien commented 4 years ago

I like the idea of sharing trust databases and potentially forming a web of trust. This kind of thing could be very useful for mitigating sybil attacks if we form a trust graph (have trust databases link to other trust databases).

However, I'm not sure how the solution described here really addresses the DoS vector. Software would generally ship with a set of pre-defined trusted bootstrap sources, and an attacker could simply DoS all peers listed in these records (or try to hide the records themselves).

The bootstrap node definition isn't dynamic, the default config will always contain the same entries and cannot be updated over the network (without updating the software).

Not quite. In go-ipfs, at least, users can specify their bootstrap peers with the ipfs bootstrap command.

The bootstrapping isn't distributed, there is only one trusted entity.

That's going to be the case here unless users add additional sources.

RubenKelevra commented 4 years ago

I like the idea of sharing trust databases and potentially forming a web of trust. This kind of thing could be very useful for mitigating sybil attacks if we form a trust graph (have trust databases link to other trust databases).

Cool!

However, I'm not sure how the solution described here really addresses the DoS vector. Software would generally ship with a set of pre-defined trusted bootstrap sources, and an attacker could simply DoS all peers listed in these records (or try to hide the records themselves).

Well, sure, there's a manual user interaction required to add those IPNS keys to their clients, but afterwards the IPNS records would be refreshed now and then and update new servers and remove old ones.

To improve the situation, we could add a fixed DNSLink like _ipfs-nodes. where companies, users, and projects could publish their IPNS record.

IPFS could than fetch the IPNS via a command like ipfs trust add --level=trusted pacman.store and IPFS fetches the IPNS key from DNS. The user can then verify the IPNS-Hash and afterward the hash will be stored in the trust-database.

So the DNS is just queried once to fetch the IPNS-key.

We could also add a simple list to the client of participating domains. So on init the user could select the domains where the keys should be fetched from:

ipfs init --show-bootstraps or something like this, would just print the list and the actual keys would be fetched via DNS.

The bootstrap node definition isn't dynamic, the default config will always contain the same entries and cannot be updated over the network (without updating the software).

Not quite. In go-ipfs, at least, users can specify their bootstrap peers with the ipfs bootstrap command.

Yeah, but each version comes with a fixed set of bootstraps. So the set cannot be updated like new servers cannot be added or removed from trusted parties.

The bootstrapping isn't distributed, there is only one trusted entity.

That's going to be the case here unless users add additional sources.

Yes. But we could create a process where the users can select different levels of trust and add new trusted entities easily, and without having to restart the node every time. It would also allow a project to start with one node, and expand in the future, without having to ask all users constantly to add the new servers.

Yes, we could do this with DNS, but I think a build-in solution that uses IPNS records is much more resilient than simple DNS.

RubenKelevra commented 4 years ago

@Stebalien

I wrote a long while back also about potential features that could be implemented with a web of trust in the web gui/desktop app, like sharing small amounts of storage with friends.

This could replace the usual "dropbox/google drive/one drive" solution people tend to use currently, while the amount of storage is usually too small for using something like Filecoin.

You also don't really want to use pinning services for something like this, since you don't need a high performance, just some additional peers where you can save your photos and documents.

More on that: https://github.com/ipfs/notes/issues/397

libp2p / notes