libp2p / notes

libp2p Collaborative Notebook for Research
MIT License
37 stars 4 forks source link

distributed bootstrapping with IPNS #24

Open RubenKelevra opened 4 years ago

RubenKelevra commented 4 years ago

Current bootstrap approach

We have a config file that contains either addresses or DNS names with public keys.

We use DNS to resolve addresses, then we connect to the fixed list of predefined nodes with known public keys and ask them for (somewhat) random nodes of the network, to spin up the DHT of the node.

The nodes are run by a trusted entity and we do this on each startup.

The problem

There are several problems with this approach:

I think we should do something about that (as discussed in other places, like https://github.com/libp2p/go-libp2p-kad-dht/issues/574).

Concept

I propose to change this very basic approach with a trust-network. It allows the users to configure their trust for projects, orgs and individuals. Thus allowing to do decisions based on those data how reliably received informations are.

Trust-DB

The trust-db is stored in the node-database, not in the config file and the bootstrap field in the config file will be ignored.

This allows us to:

Additionally trust levels

After implementing a basic trust level for bootstrapping, we could extend this to support a multitude of functions, using the trust level as ACLs.

Trust entries

Each trust-entry is a IPNS record which points to a trust-file, a trust level and a list of trusted functions as well as text-label and a lifetime. There are also some fields for the status in the end.

An example of a trust entry:

ID description
label name of this trust-db entry
pubkey IPNS record
trustlevel the trust level of this entry
trusted-functions array of allowed extended functionalities
trust-lifetime the time the trust-level validation is valid (="infinity")
fetched timestamp when this entry was fetched
TTL time after which this entry is considered stale
cache-lifetime time after which the cached entry isn't valid anymore

trust functions

is a list of allowed functions, for this specific trust entry which overwrites positively or negatively the standard function matrix.

Trust levels

ID example
peer nil trust
marginal no toughly validation, but a bit more trust than nil
trusted an organization/individual the user trusts
advanced e.g. a close friend the user trusts
ultimate user's own trust file

peer

can be used to save a known peer with a remote or dns entry to the node information.

Trust level matrix

Trust level function matrix (as an example):

Function \ Trustlevel peer marginal trusted advanced ultimate
use for bootstrap πŸ—Ά πŸ—Ά βœ” βœ” βœ”
connect on startup πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
hold connection and reconnect to πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
used for autonat detection πŸ—Ά βœ” βœ” βœ” βœ”
allow graphsync πŸ—Ά βœ” βœ” βœ” βœ”
allow to query all ipns πŸ—Ά πŸ—Ά πŸ—Ά πŸ—Ά βœ”
trusted peer exchange w/ ratings πŸ—Ά πŸ—Ά βœ” βœ” βœ”
offer relay πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
use as relay πŸ—Ά βœ” βœ” βœ” βœ”
allow redistribute ipns πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
remote resolve IPNS πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
remote put DHT πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
remote fetch DHT πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”
remote fetch CID πŸ—Ά πŸ—Ά πŸ—Ά βœ” βœ”

remote fetch

would allow thin clients, like mobile phones, to connect to other nodes and without bootstrapping. Thus allowing to query the DHT and fetch CIDs using them as a proxy. This would reduce the time-to-first-byte massively while reducing the energy consumption since this requires only a single connection that doesn't transfer any data when not actively used.

The thin clients would use and announce a relay connection to receive incoming connections if they are behind a firewall/nat.

remote put DHT

would allow a thin-client to announce content it holds to the DHT, without bootstrapping the DHT.

Trust files

The trust-files are json files with the following fields:

ID subfield of type description mandatory
description root text field for descriptions βœ”
contact root array URIs for contacting βœ”
entities root array list of persons/projects/etc. βœ” (="default")
nodes entities array entries for nodes βœ”
node nodes array identifiers (label) for node βœ” (="node")
pubkey node pubkey of the node βœ”
contact node array URIs for contacting πŸ—Ά
remotes node array ip/port/protocol etc string πŸ—Ά
remotes-strict node boolean should other remotes be omitted for this pubkey? πŸ—Ά (=true if not present)

IPNS record limitations

for the IPNS records there some limitation necessary to avoid malicious entries

bertrandfalguiere commented 4 years ago

I believe the known peers will soon be remembered accross restarts. Additionnaly, in your system, you need to know some peers to fetch the IPNS records of the list of peers to bootstrap. So you need a bootstrap mechanism to bootstrap. Or will these lists be fetched out-of-band? Am i missing something?

RubenKelevra commented 4 years ago

I believe the known peers will soon be remembered across restarts.

I know, but this only works for somewhat short periods of downtime reliably. You still need to bootstrap after a longer period of downtime, since your random "known" peers are probably not reachable anymore.

The trustlevel: peer is basically just a way to permanently receive updates for IPs/domain names of peers you're likely going to use.

Say you often receive data from a cluster, then the cluster maintainer could provide such a file and you would add it with trust level: peers. This way you don't have to use the DHT to resolve the peers.

Additionally, in your system, you need to know some peers to fetch the IPNS records of the list of peers to bootstrap. So you need a bootstrap mechanism to bootstrap. Or will these lists be fetched out-of-band?

Nope you don't - the IPNS records would be used to fetch updates and the initial data.

The data behind them would remain permanently in the database until the IPNS record expires and haven't been refreshed. So as long as you have some non-expired IPNS records in the storage, you have public keys/dns names/ips in the trust-db for bootstrapping.

Like the current config file, the binary would provide a IPNS-record and some peers for the initial bootstrap after running --init:

By default we import a list of standard nodes, like we peers to the bootstrap list, with an IPNS public key to update it, as soon as the node is connected to the network.

RubenKelevra commented 4 years ago

@achingbrain I've asked about proxing since this requires some ACLs in the node. Here's my feature request for this :)

Stebalien commented 4 years ago

I like the idea of sharing trust databases and potentially forming a web of trust. This kind of thing could be very useful for mitigating sybil attacks if we form a trust graph (have trust databases link to other trust databases).

However, I'm not sure how the solution described here really addresses the DoS vector. Software would generally ship with a set of pre-defined trusted bootstrap sources, and an attacker could simply DoS all peers listed in these records (or try to hide the records themselves).

The bootstrap node definition isn't dynamic, the default config will always contain the same entries and cannot be updated over the network (without updating the software).

Not quite. In go-ipfs, at least, users can specify their bootstrap peers with the ipfs bootstrap command.

The bootstrapping isn't distributed, there is only one trusted entity.

That's going to be the case here unless users add additional sources.

RubenKelevra commented 4 years ago

I like the idea of sharing trust databases and potentially forming a web of trust. This kind of thing could be very useful for mitigating sybil attacks if we form a trust graph (have trust databases link to other trust databases).

Cool!

However, I'm not sure how the solution described here really addresses the DoS vector. Software would generally ship with a set of pre-defined trusted bootstrap sources, and an attacker could simply DoS all peers listed in these records (or try to hide the records themselves).

Well, sure, there's a manual user interaction required to add those IPNS keys to their clients, but afterwards the IPNS records would be refreshed now and then and update new servers and remove old ones.

To improve the situation, we could add a fixed DNSLink like _ipfs-nodes. where companies, users, and projects could publish their IPNS record.

IPFS could than fetch the IPNS via a command like ipfs trust add --level=trusted pacman.store and IPFS fetches the IPNS key from DNS. The user can then verify the IPNS-Hash and afterward the hash will be stored in the trust-database.

So the DNS is just queried once to fetch the IPNS-key.

We could also add a simple list to the client of participating domains. So on init the user could select the domains where the keys should be fetched from:

ipfs init --show-bootstraps or something like this, would just print the list and the actual keys would be fetched via DNS.

The bootstrap node definition isn't dynamic, the default config will always contain the same entries and cannot be updated over the network (without updating the software).

Not quite. In go-ipfs, at least, users can specify their bootstrap peers with the ipfs bootstrap command.

Yeah, but each version comes with a fixed set of bootstraps. So the set cannot be updated like new servers cannot be added or removed from trusted parties.

The bootstrapping isn't distributed, there is only one trusted entity.

That's going to be the case here unless users add additional sources.

Yes. But we could create a process where the users can select different levels of trust and add new trusted entities easily, and without having to restart the node every time. It would also allow a project to start with one node, and expand in the future, without having to ask all users constantly to add the new servers.

Yes, we could do this with DNS, but I think a build-in solution that uses IPNS records is much more resilient than simple DNS.

RubenKelevra commented 4 years ago

@Stebalien

I wrote a long while back also about potential features that could be implemented with a web of trust in the web gui/desktop app, like sharing small amounts of storage with friends.

This could replace the usual "dropbox/google drive/one drive" solution people tend to use currently, while the amount of storage is usually too small for using something like Filecoin.

You also don't really want to use pinning services for something like this, since you don't need a high performance, just some additional peers where you can save your photos and documents.

More on that: https://github.com/ipfs/notes/issues/397