Add the groundwork for failover RPC support

c0gent commented 6 years ago

Changes

Remove the rpc_host and rpc_port fields from the configuration file format and add primary_rpc_host, primary_rpc_port, failover_rpc_host, and failover_rpc_port.
Add the RpcUrl and RpcUrlKind types.
- RpcUrl contains a host name and a port.
- RpcUrlKind has variants for primary and failover urls and contains a RpcUrl.
Remove rpc_host and rpc_port and add primary and failover RPC url fields to config::Config et. al.
Update app::Connections::new_http to automatically use the failover url in the event of an error connecting to the primary.
Add the home_url and foreign_url to Connections containing the url/kind (a RpcUrlKind) currently in use.
Update logging.
Update tests.

Closes #107 (maybe)

I don't have a very sophisticated test setup so this will definitely need more comprehensive testing in a more realistic environment.

@akolotov: If I understand your original issue correctly (please correct me if not), the main purpose of this is for the failover not simply to be available upon startup, but to be used in the event of a disconnection. I'm not familiar enough with the whole project yet to know exactly how and where that happens. This patch only makes use of the failover url on startup (creation of an App/Connection) but does nothing special in the event of a disconnect at a later point. Therefore it may not completely address your issue and should be considered only a partial implementation.

I could use your help showing me where and how disconnection is currently is handled so that I can add some more logic to that process. I'm available on voice or Zoom if that's easier. Thanks!

akolotov commented 6 years ago

Thanks! I will look at this PR tomorrow (I am in MSK timezone) and provide my feedback.

yrashk commented 6 years ago

After some further thinking about this problem, here's my take on how it should be implemented to keep it simple and yet flexible so that even future scenarios can be accommodated.

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process. Currently, an exit code for failed RPC connection doesn't make distinction between home and foreign. However, if we track the side on which the failure has occurred and use different exit codes for different sides, the supervisor can simply restart the bridge with a different config, according to its own logic. Could be more than two failovers, for example, or a check on what's available using curl and then using the appropriate config.

This will work particularly well combined with persistent transaction queues.

c0gent commented 6 years ago

Instead of building very specific features (such as failover RPC), we can further utilize the approach that has been used in the bridge in the past couple of months: delegating failure handling to a supervisor process.

I agree, this is a superior approach in this situation. And yes, this will need some re-architecting. I'll have a closer look at this later in the week and put together some ideas.

c0gent commented 6 years ago

It looks like I'm going to have to put this PR on the back-burner for now. I should be able to get back to it sometime in the next few weeks. Let me know if you come up with any other plans or suggestions, or if there are some simple things I can do to this PR to make it mergeable for now and deal with the larger issues in a separate PR later.

yrashk commented 6 years ago

I think this PR is premature insofar as indicated by the comment in which I mentioned that it is unlikely to actually ever switch over (leaving aside the suggestion for simplifying this whole solution), so let's return to this later or if @akolotov needs it sooner, I can take care of it in the coming days.

omni / poa-bridge

Add the groundwork for failover RPC support #110

Changes