Open kruisdraad opened 6 years ago
addition on the community part is to create a registry where nodes are registered ... something like the RIPE whois system for IP's, but then for ILP nodes.
Looking at the code https://github.com/interledgerjs/ilp-connector/blob/master/src/routing/prefix-map.ts this seems to do all the routing, which seems quite simplified. The problem is, its not that simple.
Globally, its not a peering relation, but a master/child. It should not matter which of the two peers initiates the session and the plugin should be able to both have the same configuration and the session is established either way.
In general we need to split a few processes. For example which session is a child process adding data onto the local routing table, if a session resets (config change) not the entire routing table has to be restarted (as discussed on roadmap already).
In addition we need a method confirming an identity from peers to prevent duplicate names (or hijacks?) in the network. The offline registry is a administrative solution to assign unique addresses and have naming requirements implemented. But if we look at the RPKI example, routes are signed and able to be validated. Perhaps its possible to use the wallet key material to sign the identity (direct peers) and use a smaller part for upstream validation. It still rely's on nodes to trust their peers, but if a specific peers does not validate his peers and they are roque, its easier to see which one should be kicked off. For example, what happens if a T1 added a roque T2 with all the g.nodes names of the entire network?
I suggest peering plugin will get a few extra options to influance routing:
Idle: This is the first stage. ILP detects a start event, tries to initiate a TCP connection to the peer, and also listens for a new connect from a peer router. The ConnectRetryTimer is set to 60 seconds and must decrement to zero before the connection is initiated again. Further failures to leave the Idle state result in the ConnectRetryTimer doubling in length from the previous time.
Connect: In this state, ILP initiates the TCP connection. If the 3-way TCP handshake completes, the established BGP Session BGP process resets the ConnectRetryTimer and sends the Open message to the neighbor, and then changes to the OpenSent State. If the ConnectRetry timer depletes before this stage is complete, a new TCP connection is attempted, the ConnectRetry timer is reset, and the state is moved to Active. If any other input is received, the state is changed to Idle.
Active: In this state, BGP starts a new 3-way TCP handshake. If a connection is established, an Open message is sent, the Hold Timer is set to 4 minutes, and the state moves to OpenSent. If this attempt for TCP connection fails, the state moves back to the Connect state and resets the ConnectRetryTimer.
OpenSent: In this state, an Open message has been sent from the originating router and is awaiting an Open message from the other router. After the originating router receives the OPEN message from the other router, both OPEN messages are checked for errors. The following items are being compared:
If the Open messages do not have any errors, the Hold Time is negotiated (using the lower value), and a KEEPALIVE message is sent (assuming the value is not set to zero). The connection state is then moved to OpenConfirm. If an error is found in the OPEN message, a Notification message is sent, and the state is moved back to Idle. If TCP receives a disconnect message, BGP closes the connection, resets the ConnectRetryTimer, and sets the state to Active. Any other input in this process results in the state moving to Idle.
OpenConfirm: In this state, ILP waits for a Keepalive or Notification message. Upon receipt of a neighbor’s Keepalive, the state is moved to Established. If the hold timer expires, a stop event occurs, or a Notification message is received, and the state is moved to Idle.
Established: In this state, the ILP session is established. ILP neighbors exchange routes via Update messages. As Update and Keepalive messages are received, the Hold Timer is reset. If the Hold Timer expires, an error is detected and BGP moves the neighbor back to the Idle state.
Keepalive (integer, seconds, 0-3600): Interval to send pings thru to peer, if they fail multiple times this drops the connection.
holdtimer (integer, seconds, 0-3600): When a peer connects and is fully established, this timer will hold accepting routes until the timer has expired. This will prevent a 'looping remote peer' to break the local routing table if it was able to send a prefixlist and 'crashed' when that was successfull. Prevents DoS attacks.
hoplimit (integer 0-100): This settings states how many hops down the route should be allowed in the local routing table. For example, with a T2/T3 relation, you only want it to have a 1 hop, so you dont want to receive the T3 from another route that less good (direct peering). On the other side a T3 would say there is a limit of 10(or 0 for unlimited) so it will be able to reach the entire network (transit). This way we can prevent the T3 accidentally taking up T2/T1 roles because its route might be selected first.
Weight (integer 0-2000): Add some weight to the local prefix. The default is 100 (and also for every hop away). You can either lower it so say, this peering and its route have a higher preference (which you want for IX's) or increase it (backup routes). This will help stabalizing the netwerk too, as routes are less likely to change even on reloads as the relations are better defined. This also allows to create a higher total weight for routes far away (e.g. EU node vs US IX) which is not the optimal route as it increases latency.
PrefixFilter (ANY or array of g.names): This will allow a peer to restrict trust only to the parties he signed on for. For example, lets say we wont trust X with our money. We sign on Y, but it was a peering with X and X start routing thru our node. We have no way to prevent that flow. There has to be some way to have a default protection against bad nodes.
cc @justmoon @sappenin @dora-gt
= Community =
We need to communicate with the community thru a good process, example changes in network. This is needed now, announcing changes, requirements and asking for feedback. My suggestion would be to create a non-profit organation where anyone can be a member and members can have resources like G_addr (like RIPE) which are all vetted (process needs to be defined for intake and auditting).
Some might not want to register, so the IX's will provide a limited amount 'anonymous' connection points, with a 'disclaimer' stating those nodes to not qualify from the community guidelines, security, trustlevel, etc.
We can redesign ILP-IX to take on this role, but creating some official entity with a member voted board and foundation statutes will be a good start. Perhaps take on some (non technical) community managers that will focus on communication and making sure developers and users are on the same road.
= Technical =
First we need the moneyd endpoint to be redundant, if the ILP the endpoint is using goes down, everything fails. Talk between Dino/Crosswire suggested 2 approches: Either having some kind of peering method with multiple peers (roaming paychan?). Another althernative is that the configuration will be more dymamic, as in remove the ILP connected from the config. At startup just create a new connected to some node, perhaps based on a filter to specific locality (e.g. i want a EU node). At startup it will create a new paychan and cleanup old paychan's automatically, instead of manually. This creates a zero configuration client even. (medium change)
ILP connectors (T2-3) need to have redundancy for clients. A client with a paychan should be able to change a 'friended' node (e.g. switch thru mlab1 to mlab2-3) that can interact with the paychan (trust line?) (large changes, previously change is perhaps better?)
Endpoints should have a auto derived node name added onto the routes. This might seem weirds, but it allows the entire network to be mapped a lot easier. The node name should be linked to its connected ILP node (then you can aggregate the routes if you want to exclude them). Having it connected onto the same 'routing network' allows some more diagnostic tools, e.g. ping end to end from endpoints. Its also adds marketing points showing network growth.
ILP nodes should exchange information, such as version AND have the ability to filter on specific information, again version is a good example. This will allow the ILP node to refuse clients from a too old moneyd version, expecially in this fase where you want people to have updated client. Second example would be the XRP address, allowing filtering of fraudulent/abusive addresses.