Closed unicell closed 4 years ago
mh is more useful to avoid using ipvs connection sync daemon. Have you tried to use mh to do a seamless failover? Last time I check it, you need net.ipv4.vs.sloppy_tcp to keep existing connection going. It's a global config so I'm not sure what's the impact of that for other non mh services.
Yes, you're right. Let me explain our use case a little bit. I'm currently running an in-house control plane with some pieces from seesaw
. And it is based on an in-house consistent hashing kernel module ipvs. And yes, sloppy_tcp is required in that setup, so seamless failover can happen without conntrack sync.
The plan my side is to migrate over to Maglev hashing our side. Through that process, I'm trying to sync some patches back to seesaw.
So in short, I'm hoping to shape it in the way that:
seesaw
can still be leveraged in our use cases (Not as a single full piece. For example, there's no conntrack sync in our setup)This PR is one of them. It does not break existing use cases while making it possible for someone (like me) to fully switch to Maglev without having conntrack sync. Does that make sense to you? Let me know. Thanks.
Your change is completely fine. I'm just curious what's the impact of turning on sloppy_tcp in a use case where both mh and other scheduling method is used (schedule method is a per service config while sloppy_tcp is global). I think the impact is that for other scheduling method, a failover session will not be reset immediately by seesaw machine but instead by real backends which seems OK to me.
Could you add some comment int the proto so that people know what it is. I think we need these in the comment:
Your change is completely fine. I'm just curious what's the impact of turning on sloppy_tcp in a use case where both mh and other scheduling method is used (schedule method is a per service config while sloppy_tcp is global). I think the impact is that for other scheduling method, a failover session will not be reset immediately by seesaw machine but instead by real backends which seems OK to me.
I think it's not just about turning on sloppy_tcp
but also turning off contrack sync entirely, which probably doesn't make sense if on other scheduler settings.
Could you add some comment int the proto so that people know what it is. I think we need these in the comment:
- this is only for 4.18+
Done
- you need net.ipv4.vs.sloppy_tcp to have seamless failover (why can't I find a ipv6 counterpart)
Done. I think the same sysctl setting works for ipv6 as well per ipvs implementation in net/netfilter/ipvs/ip_vs_proto_tcp.c
- impact of sloppy_tcp to other non-mh services.
As above, I think conntrack sync is the key here not sloppy_tcp. It probably doesn't make sense to have Maglev + conntrack sync + sloppy_tcp. And probably the same for using Non-Maglev + turning off conntrack sync.
So updated comments in proto to reflect the thoughts above and assume actual deployer understand the implication.
Let me know if that makes any sense.
Maglev hashing became available since v4.18 kernel. This change adds the support to make it a choice with Seesaw.