painless-security / trust-router

Moonshot Trust Router
0 stars 0 forks source link

Trust router getting stuck when a peer disconnects #67

Closed jennifer-richards closed 6 years ago

jennifer-richards commented 6 years ago

Several times, I have seen the trust router sit, doing nothing, when trying to write/read to/from a peer that has disconnected. Sometimes this includes an error about a tr_mq event being triggered without a message to retrieve.

Need to diagnose this.

jennifer-richards commented 6 years ago

This was caused by an incorrect thread exit procedure that was leading to seg faults and / or deadlocks (or some fun hybrid combination, deadlocked trying to lock a mutex in a bit of invalid memory that happened to belong to this process).

This crash was happening every time for me. #68 seems to solve it.

jennifer-richards commented 6 years ago

Test

Steps

  1. Set up two trust routers as peers
  2. Start them up and let them establish two-way communication
  3. Terminate (Ctrl-C) one of them. Wait through at least one TRP update cycle
  4. Re-start the terminated trust router.

Expected results

In step 4, two-way communication should again be established between the trust routers. If you do a show peers monitoring request before step 4, it should show the peer as not connected (you need to wait until the trust router attempts to send an update before it will notice that the connected_to connection has gone down, so it's ok if it is temporarily reporting that only the connected_from connection is lost).

If the trust router becomes non-responsive, this test has failed. It's a good idea to try this test a few times.

meadmaker commented 6 years ago

Tested and verified!