yggdrasil-network / yggdrasil-go

An experiment in scalable routing as an encrypted IPv6 overlay network
https://yggdrasil-network.github.io
Other
3.51k stars 242 forks source link

Investigate backpressure routing #111

Open Arceliar opened 6 years ago

Arceliar commented 6 years ago

This and #41 may be mutually exclusive. This is definitely mutually exclusive with source routing, so we'd be breaking from the original intent of testing greedy routing as possible pathfinding strategy for cjdns.

As of v0.2, we currently used a sort of local-only backpressure and LIFO queues to route around congestion in some cases, but we still require the greedy routing distance criteria to be met (to prevent routing loops). LIFO backpressure routing has some very interesting properties. If we added queue size announcements to the protocol, we could take advantage of the distance metric as a baseline pressure gradient to get around many of the normal issues (long delay, queues for every node) that prevent backpressure from being used in general purpose packet switch networks.

This would require reworking the lookup logic quite a bit, but there's several ways to do that which I think would work, and I'm not sure which one I'd want to go with. I'm also not sure when it would be appropriate to send queue size updates, or what format they should take (queue per destination, or do we cluster regions of the tree together under one queue?).

Arceliar commented 6 years ago

If it turns out that backpressure routing isn't practical, then we should consider (optional?) source routing. That would help insulate existing sessions from coord changes.

Arceliar commented 6 years ago

Status update.

As of #146, there are now separate queues associated with each session (to the extent that we can figure this information out--there's a map of queues, with the map keys being derived from coords and handle for encapsulated IPv6 traffic, or coords and pubkeys for protocol traffic). These queues are currently FIFO and head drop packets that have been in the queue longer than 25 ms. That fixed 25 ms threshold is arbitrary, and should probably be replaced with something that can be adjusted through some feedback loop, similar to CoDel (or maybe even exactly CoDel). This needs future study.

If peers would communicate current queue sizes with eachother, then this information would be enough to add backpressure routing. I'm not sure what the best way to do this is, since notifying all neighbors of all queue state changes seems excessive. As it stands, only local information is used, so we pretend that the queue size is metric space distance to prevent routing loops (but forward to any available valid next-hop, instead of queueing a packet up to send to the best next hop regardless of congestion).

neilalexander commented 6 years ago

Status update:

Queues are now identified by IPv6 flow label (since PR #171), which uniquely identifies TCP (and usually related UDP) traffic streams. This is done by appending a 0 to the coords, which guarantees that the packet is sent to the local router, followed by the flow label which is used for queue classification.

Arceliar commented 6 years ago

In the long run, we should probably replace the coord hack with a dedicated field in the traffic header. Possibly filled by some fast non-cryptogaphic hash of source/dest IPs and flow label, maybe with some session info (session handle? public session keys?). It would be nice to not depend on the self-peer-is-0 hack to make it work when it's attached to coords.

Arceliar commented 5 years ago

The current queue stuff seems to be doing an acceptable job so far, and further work require adding to and/or changing some things in the protocol (in particular, communicating some info about queue sizes to neighbors, which could use a lot of bandwidth), so moving to v0.4 or later.