influxdata / influxdb-relay

Service to replicate InfluxDB data for high availability
MIT License
855 stars 350 forks source link

Why not just a replication stream and master fallover? #3

Closed 7imbrook closed 8 years ago

7imbrook commented 8 years ago

Having relay just adds complexity and multiple failure points where order can get messed up. I don't see the advantage or safety aspect of using a relay.

        ┌─────────────────┐               
        │writes & queries │               
        └─────────────────┘               
                 │                        
                 ▼                        
         ┌───────────────┐                
         │               │                
┌────────│ Load Balancer │────────────┐   
│        │               │    ┌──────┐│   
│        └───────────────┘    │/query││   
│           │          |      └──────┘│   
│           │          |              │   
│┌──────┐   │┌──────┐  |              │   
││/query│   ││/write│  | Failover     │
│└──────┘   │└──────┘  |_ _ _ _ _     │
│           │                    │    │
│           ▼   ┌───────────┐    ▼    ▼  
│  ┌──────────┐ │replication│┌──────────┐
│  │          │ └───────────┘│          │
└─▶│ InfluxDB │─────────────▶│ InfluxDB │ 
   │          │              │          │ 
   └──────────┘              └──────────┘ 

Thoughts?

joelegasse commented 8 years ago

Here are my personal thoughts/opinions, since you asked. :-)

This would be another viable option, and could accomplished using Kapacitor and a custom output plugin, or possibly updating the current InfluxDB output in Kapacitor to allow writing back to a second server.

This relay is just meant to be one example of setting up high-availability with InfluxDB. There are certainly others, some are listed here. The example you provided isn't that much different than the Nginx example listed, either. The primary difference being that Nginx would have the responsibility of duplicating the data rather than adding that extra responsibility to the server that might fail soon.

Doing the replication in InfluxDB would also add nontrivial complexity that is avoided by having a complete shared-nothing database node. For example, if the secondary server is slow to respond, should the primary server return success, or wait to ensure the data was replicated? Then we have to be concerned about changing the configuration of the primary node (potentially with a restart, defeating the purpose of HA) should we have to rotate through secondary nodes (possibly during a secondary node failure/recovery). What about replicating to multiple nodes?

Doing the replication in a relay process removes the concept of "primary" and "secondary" nodes, and you no longer have to be concerned about the direction of replication. It also means relay nodes can be added separately from database nodes. That means that extra database nodes can be added by the extra-paranoid sysadmins we all know and appreciate without affecting the currently running database nodes.

Personally, I think trying to recover a streaming replication setup would be more difficult that recovering independent nodes that happen to be getting the same writes via a relay service. Unless the fail-over is read-only, in which case only your queries are highly-available, and not your writes.

As I said at the beginning though, there are many ways to provide high-availability, and this relay is just meant to be one example of achieving that goal.

7imbrook commented 8 years ago

You have good points, and I'm just gonna focus on one thing that makes me pause when looking at this. How do you guarantee order? Are we relying on client timestamps? I feel that if writes come into the database at different time on each host you get inconsistent data between them. Is this not a guarantee of influx? If it's not than I understand why the decisions were made.

joelegasse commented 8 years ago

Excellent point, I would recommend that the client timestamp the data. That way, if the points fail to write and are retried, the original time will be used. It also means that points can be logged locally and batched to the server, preserving the original time of the event. Or, in this case, the relay service could potentially timestamp points that don't include them, adding some level of consistency, but client timestamps would be much better.

Providing a timestamp with writes make them idempotent. If a point gets written twice and has a timestamp, it will be deduplicated. If instead it is written twice while relying on the server for a timestamp, you will end up with two distinct data points.

jonseymour commented 8 years ago

The replication approach does seem like a more robust solution to me.

Consider a case where one of the influx db nodes fails, recovers and then the other one fails and then recovers - the two databases will have diverged and it won't be possible to get them back into sync since both databases will have shards that have contents that the other one doesn't have. A replication based approach would seem not to have this problem.

In my current single node influx setup, I do get write errors on occasion because of transient CPU shortages. With this relay setup, it would seem that I would get eventually get queries that return different results based on what servers the queries happened to hit.

filipkokorevs commented 8 years ago

jonseymour: good point!

It shouldn't be even node failures or CPU shortages. A simple kernel update and following restart of a single cluster node will make influxdb cluster not in sync anymore.

filipkokorevs commented 8 years ago

Yeah, actually I should have read the docs before.

Let's say one of the InfluxDB servers goes down for an hour on 2016-03-10. Once the next day rolls over and we're now writing data to 2016-03-11, we can then restore things using these steps:

Create backup of 2016-03-10 shard from server that was up the entire day Tell the load balancer to stop sending query traffic to the server that was down Restore the backup of the shard from the good server to the old server Tell the load balancer to resume sending queries to the previously downed server During this entire process the Relays should be sending writes to both servers for the current shard (2016-03-11).

this is from https://github.com/influxdata/influxdb-relay#recovery

This is ofc far from the production solution, but production replication will come with enterprise version, paid:

https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/

joelegasse commented 8 years ago

With the updated buffering changes, small outages for a backend server will no longer cause inconsistent data, as long as the relay stays up until the buffer is flushed. I'm going to close this issue, as it diverged from the initial topic. Please open up a new issue with any additional concerns. :-)