libp2p / go-libp2p

libp2p implementation in Go
MIT License
6.09k stars 1.07k forks source link

Question: Routing enabled based on protocol? #501

Open ickby opened 5 years ago

ickby commented 5 years ago

For my application I currently disabled routing. From vaguely following development of libp2p I see that routing will be enabled by default in the future, and that important things like udp Nat hole-punching will use routing. So it is best to enabled routing. However, my apllication sends huge amount of data between peers, which my bootstrap server nodes should not handle. So the question is: Can we disable routing for certain protocols? E.g. all data heavy protocols work only with direct p2p connections?

ickby commented 5 years ago

If not possible to restrict routing for certain protocols, can we somehow query if a connection is established via routing or direct to the peer? Than I would be able to manually restrict the data heavy operations.

Stebalien commented 5 years ago

For my application I currently disabled routing.

I assume you're talking about relay (p2p-circuit). When we say "routing", we're usually talking about the DHT (finding peers, content, etc).

The plan is to use relay as a last resort. Basically:

  1. If we detect that we're behind a NAT, we'll advertise a relay address.
  2. We'll only dial a relay address if the target node advertises one.

We've also discussed introducing a "dial-back" protocol where a node NOT behind a NAT could connect to a NATed node and ask said node to create a direct connection back to them.

As you noted, hole-punching will likely use this as well. We'll need to connect over a relay and then re-connect using hole-punching (coordinating over relay).


In terms of restricting protocols operating over relays:

  1. In general, I recommend you chunk your data up into smaller pieces so you can start sending data over the relay connection and then migrate over to the direct connection if/when we can establish it. Ideally, we'd have a way to migrate streams between connections but we haven't put much work into this yet.
  2. You can tell what protocol is being used for a connection. Given that you can just not negotiate certain protocols over certain connections (although doing this isn't very ergonomic at the moment).
  3. Relays can't tell what protocols are being spoken over a connection they're relaying so you can't enforce this at the relay.

However, you bring up a good point: we need a way to specify transport requirements when creating a new stream. However, this can be tricky because it may not be possible to establish a direct connection and one'll usually want to use the relayed connection anyways if there are no other options.

ickby commented 5 years ago

Thanks for the detailed reply. You are correct, I meant relay and mixed up words.

I will look into your suggestions, especially option 2 gives me a good starting point. I still want to comment on your conclusion:

"it may not be possible to establish a direct connection and one'll usually want to use the relayed connection anyways if there are no other options."

Imho the relay nodes will always be the bootstrap ones, as they are not behind a nat by definition. In a small networks like mine the chance for other non-nat nodes is small. So with data heavy protocols the burden of relay will be on the nodes I set up for bootstrap and there is a high risk of unbound costs.

So you are right from user point of view if you say the connection must be established anyway, but from provider point of view this does not hold. To minimize large money risks I cannot relay heavy data operations over my bootstrap nodes.

Stebalien commented 5 years ago

In a small networks like mine the chance for other non-nat nodes is small.

Not necessarily. We currently use use UPnP to ask NATs forward a port for us. This actually works surprisingly well.

Once we get UDP hole punching (which, unfortunately, may take a while) it should be even easier to traverse NATs (although nodes that can't port-forward won't make good relays as connecting to them directly will require quite a bit of work).

To minimize large money risks I cannot relay heavy data operations over my bootstrap nodes.

We do need to introduce some kind of rate-limiting, sustained throughput limits, etc. You're right, transferring large files through relays isn't very sustainable (in most cases). However, IMO, this needs to be enforced at the relay through QoS.


What's your primary motivation in running a separate network? Ideally, this cost will be amortized across the entire network.

ickby commented 5 years ago

Thanks for the answer, I will need some time processing this, as I'm still rather new to libp2p.

In my use case I have multiple small groups sharing data within the group. I know them beforehand and hence don't need to find something in the network (P2P is only a subpart of my application used for the data transfer) Therefore I don't use the dht and need the bootstrap node only for detecting the "outside address" of the user nodes. The easiest for this was just setting up my own. But I may reconsider that, need to think about it.

Stebalien commented 5 years ago

In that case, you could probably disable relay entirely.

(note: we should provide better ways to work around this; that's just an immediate solution)

ickby commented 5 years ago

Yes, as I did up to now, but I would later like to use the hole punching improvement. Hence the discussion.