painless-security / trust-router

Moonshot Trust Router
0 stars 0 forks source link

Fix the trpc thread exit procedure #68

Closed jennifer-richards closed 6 years ago

jennifer-richards commented 6 years ago

The messaging between the main thread and the trpc (outgoing connection) threads allowed the trpc data to be cleaned up before the message queue was empty, causing incorrect mutex behavior and seg faults.

This is (I hope!) solved by adding an additional shutdown phase in which the main thread indicates that it has recognized that the trpc thread is done and that the trpc thread can safely exit.

So far, I have not seen a failure of the system to handle a peer disconnecting. Prior to these changes, it failed every time with my current setup.

-- I forgot to add this in the git comment, but I also found that I was mixing up the purpose of the gssname and peer in the TRP_CONNECTION structure. These are both TR_NAMEs. The former is the GSS service name of the local trust router (i.e., how we identified ourselves on this particular connection). The latter is the GSS name of the remote peer. During cleanup of incoming threads, the gssname rather than the peer name was used to decide which peer to mark as disconnected. This caused the search to fail, and peers were never marked as disconnected.

This should not be merged until after jennifer/monitoring. Assigning to myself to resolve any conflicts once that happens.

jennifer-richards commented 6 years ago

This should not be merged. It was superseded by later work. It will be part of the history via other pull requests. Closing.