interledger4j / ilpv4-connector

An ILPv4 Connector implemented in Java
https://java-connector.ilpv4.dev
Apache License 2.0
19 stars 10 forks source link

Static or Default Routes with long timeouts can cause server to hang. #663

Closed sappenin closed 4 years ago

sappenin commented 4 years ago

When the server starts-up, it scans through any static routes in the database and adds these to the routing table. If the timeout on these routes is sufficiently long, this can cause the server to not actually startup and begin listening to incoming connections until all static routes have been processed.

In one instance, there were hundreds of static routes configured, each hanging on a 60s timeout. The Kubernetes cluster running this instance was set to kill the pod after 15m, so the server got stuck in a loop where it could never startup because it wasn't able to process CCP route-updates in time before being killed.

Some possible mitigations:

  1. Execute the CCP route-update requests in parallel so that they don't block each other.
  2. Don't execute any CPP route-update requests untli after the server has started up.
  3. Execute all CPP operations in a different thread so that whatever happens in there, it doesn't block the main thread.
sappenin commented 4 years ago

Also, see JS doc here -- CHILD peers by default neither send nor receive routes, which makes sense. E.g., a parent really shouldn't receive routes from a child. Additionally, a child doesn't need to send routes to the child because the child doesn't really care. The Child can use the parent as a default-route.

We should also consider getting rid of the AccountRelationship since it may cause confusion. This value is only used for routing, and we could consider just using the send/receive routes instead (note: look through JS code first before removing).

sappenin commented 4 years ago

See https://github.com/interledger4j/ilpv4-connector/issues/655 -- that's half of this bug.