Open dcowden opened 6 years ago
@dcowden
Based on the provided traces (https://github.com/weaveworks/weave/files/1975806/foo.pcap.gz), the following is happening:
-A KUBE-SERVICES -d 100.64.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
<..>
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-JILKODJ63HVFF6B2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-LFXGESA25DLV4HVG
<..>
-A KUBE-SEP-JILKODJ63HVFF6B2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 100.117.128.12:53
-A KUBE-SEP-LFXGESA25DLV4HVG -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 100.99.128.9:53
insert_failed
counter is incremented (check with conntrack -S
) and the request is dropped => you get the timeout.As I mentioned above, the --random-fully
flag does not help here, as it's only for SNAT which is not the culprit in your case.
@Quentin-M
As you use the ipvs backend, I'm curious to see your iptables-save
output.
@dcowden
I suspect this triggers the DNAT issue.
Could you verify this by checking insert_failed
counter value with conntrack -S
?
@brb Will do, but i will not be able to do it until later this week. That said, I can't imagine that your analysis is wrong-- what do you think is the fix? and/or can you suggest workarounds?
I'm honestly shocked that most of the internet isnt' saying 'well kubernetes is great, but you're going to have packet loss issues'.
We do not use UDP for much other than DNS, so one idea i've been thinking about is to somehow run kube-dns as a daemonset with hostNetwork=true-- thus removing some of the NAT. But i think that'd be hard to do with kops, because kops bundles the kube-dns manifests, and we'd have to override them.
And even so, that'd be a workaround ( albeit it is very reasonable to assume that DNS is the only UDP protocol that would expose this race condition so frequently).
Another workaround, based on your analysis, would be to avoid using a VIP, and instead configure pods to use the individual pods with a round robin cluster ip A records. I'm not sure if that configuration is possible.
@brb huge kudos for figuring that out!
@dcowden when I used to do electronic trading for a living I would find that people live with the most egregious network problems and never think "this is really broken". A tiny minority of people care enough to look at what is really going on.
Also the issue is sensitive to what exact technologies you use - for instance at Weaveworks we write most services in Go so they don't use the glibc resolver.
run kube-dns as a daemonset
so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf
to point at that IP requires a Kubernetes change.
instead configure pods to use the individual pods with a round robin cluster ip A records.
Not really following this suggestion. AFAIK resolv.conf has to have the IP addresses of servers, not DNS names. If we could get Kubernetes to keep the IP addresses of kube-dns pods static across restarts, that would be plausible, but not currently a feature.
so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf to point at that IP requires a Kubernetes change.
We are already using a hack that updates resolv.conf on pod start in our container entry point to add option single_request-reopen we would need to use that in combination with the downward api to inject the host ip. It stinks but it would work maybe?
Not really following this suggestion. AFAIK resolv.conf has to have the IP addresses of servers, not DNS names. If we could get Kubernetes to keep the IP addresses of kube-dns pods static across restarts, that would be plausible, but not currently a feature.
yeah you're right, there would be no way to assign static ips to the pods to make this work.
@brb yes it appears to be the case. below is the output on the same host on which the tests above ran.
/home/weave # conntrack -S
cpu=0 searched=630288 found=15093196 new=1346365 invalid=34 ignore=647629 delete=1408965 delete_list=1408867 insert=1344752 insert_failed=92 drop=0 early_drop=0 error=0 search_restart=0
cpu=1 searched=846871 found=28126666 new=1919780 invalid=74 ignore=650870 delete=1855000 delete_list=1854877 insert=1921172 insert_failed=107 drop=0 early_drop=0 error=0 search_restart=0
cpu=2 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=3 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=4 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=5 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=6 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=7 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=8 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=9 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=10 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=11 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=12 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=13 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
cpu=14 searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0
@dcowden What is your CentOS and kernel vsn?
For what’s it’s worth, I use the latest Container Linux, and yes, insert_failed increases systematically by 1 every time I send a DNS request.
@brb
[root@ip-172-25-83-254 ~]# more /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
[root@ip-172-25-83-254 ~]# uname -a
Linux ip-172-25-83-254.colinx.com 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I would just like to add here that the single-request(-reopen)
workaround does not work with Alpine-based containers, as musl does not support the option (see below). Unfortunately, Alpine Linux is the base image for 90% of our infrastructure.
if (!strncmp(line, "options", 7) && isspace(line[7])) {
p = strstr(line, "ndots:");
if (p && isdigit(p[6])) {
p += 6;
unsigned long x = strtoul(p, &z, 10);
if (z != p) conf->ndots = x > 15 ? 15 : x;
}
p = strstr(line, "attempts:");
if (p && isdigit(p[9])) {
p += 9;
unsigned long x = strtoul(p, &z, 10);
if (z != p) conf->attempts = x > 10 ? 10 : x;
}
p = strstr(line, "timeout:");
if (p && (isdigit(p[8]) || p[8]=='.')) {
p += 8;
unsigned long x = strtoul(p, &z, 10);
if (z != p) conf->timeout = x > 60 ? 60 : x;
}
continue;
}
struct resolvconf {
struct address ns[MAXNS];
unsigned nns, attempts, ndots;
unsigned timeout;
};
I have reached out on the freenode's #musl channel, but unfortunately it does not seem like there is much desire to add support for the option:
[16:19] <dalias> why not fix the bug causing it?
[16:20] <dalias> sprry
[16:20] <dalias> the option is not something that can be added, its contrary to the lookup architecture
[17:39] <dalias> quentinm, thanks for the report. i just don't know any good way to work around it on our side without nasty hacks
[17:40] <dalias> the architecture is not designed to support sequential queries
@dcowden @bboreham @brb @dcowden @xiaoxubeii
For what it's worth: I simply switched a two-nodes cluster that was broken (5s latency for every single curl, except when single-request was used), from the latest weave to calico 2.6, and the issue went away immediately. None of my pods experience the DNS issue where AAAA packets would get dropped anymore.
I will be happy to grant access to a cluster where the issue is present if that means we will get some help 💯
@Quentin-M thanks for the report. We'll try this next. For now we're working around but-- annoying to say the least! Our problem is that calico doesnt support encryption on the cluster overlay. weave does this better than any of the others, so i hope we can keep using weave!
@dcowden @bboreham @brb @dcowden @xiaoxubeii
Another very interesting note, when FASTDP is disabled (but encryption is still on), the issue also disappear. I tested this on 4 clusters, with regular and jumbo MTUs.
How exactly did you disable fastdp?
Another very interesting note, when FASTDP is disabled
My guess is that due to slower nature of the sleeve mode races are less likely to happen, but not completely unavoidable.
@Quentin-M
For what it's worth: I simply switched a two-nodes cluster that was broken (5s latency for every single curl, except when single-request was used), from the latest weave to calico 2.6, and the issue went away immediately.
That's interesting. Do you use the IP-in-IP tunneling with Calico?
@brb @bboreham
How exactly did you disable fastdp?
Once, I simply dropped the following in the Weave's manifest, used reset and let Kubernetes do a roll Weave. Later, I did the same thing but also killed all the pods. And another time, I edited the manifest, then killed all the nodes, letting new identical ones come back, with fresh configuration/networking, re-scheduling all the pods. Every time, I verified using weave --local status connections
.
- name: WEAVE_NO_FASTDP
value: "true"
That's interesting. Do you use the IP-in-IP tunneling with Calico?
Yes, IP-in-IP set to always
. Happy to drop the manifest if necessary.
My guess is that due to slower nature of the sleeve mode races are less likely to happen, but not completely unavoidable.
That was one of my ideas too, yeah.. Calico is supposedly "pretty fast" as well, even in IPIP (I believe it is done in the kernel too), but the timing might be just different enough to avoid it. Or, the problem is different.
Thank you.
When a single pod is used to wget/curl a target, a tc policy that delay every other DNS datagram by, say, 10ms seems to alleviate the issue entirely: netem gap 2 delay 10ms reorder 100%
. However, this may not work much when multiple pods are making requests as the policy applies to the whole node and therefore may not induce delay between the two parallel A/AAAA datagrans coming out of a single pod, but between two A requests of different pods. This actually may not be true and work properly depending on how SNAT/DNAT/conntrack operates, but I am not expert enough.
Another interesting rule is to add random delay to every single DNS datagrams going out, but this does not work 100% of the time, even with a single pod making requests, as the two A/AAA datagrams may be sent with delays that are close enough to each other that the race still happens. There might be a smart thing to do here to make it work reliably.. Maybe rate control.
The traffic shaping may be applied to DNS requests only using filters, but due to the low-level nature of the issue, the drops may also happen to any of traffic on the network.. We are for example about to migrate major graphite/statsd clusters, that sent a high volume of UDP datagrams, and I am worried the issue will also occur there, but become much more problematic, especially as the datagrams will have to be shaped on the ingress side.
Here is the workaround we are about to start using: https://github.com/Quentin-M/weave-tc/blob/master/weave-tc.sh, which seem to reduce the likelihood of the race significantly. Using it is as simple as adding the following container to the weave DaemonSet:
- name: weave-tc
image: 'qmachu/weave-tc:0.0.1'
securityContext:
privileged: true
volumeMounts:
- name: xtables-lock
mountPath: /run/xtables.lock
- name: lib-tc
mountPath: /lib/tc
Is there really nothing that the weave team can do
What we're doing is gathering data to understand the issue(s) and analyzing it. Sorry if this comes across as "nothing".
@quentin-m holy cow, man we will try that solution out and see if it works for us. What side affects should we watch out for?
It's been a long time since I have read a shell script that was so far over my head.. that's some highly impressive work!
@Quentin-M I am getting No distribution data for pareto (/lib/tc//pareto.dist: No such file or directory)
does the host need to have something installed as well? What should lib-tc
point to on the host? Maybe you can provide your deployment set yaml for me to compare :)
@thomaschaaf Absolutely!
I mount /run/xtables.lock
and /lib/tc
.
Pareto should already be on the host, it is part of iproute2, which is essentially the same everywhere.
apiVersion: v1
kind: ServiceAccount
metadata:
name: weave-net
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: system:weave-net
namespace: kube-system
rules:
- apiGroups:
- ''
resources:
- pods
- namespaces
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: system:weave-net
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:weave-net
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: weave-net
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: system:weave-net
namespace: kube-system
rules:
- apiGroups:
- ''
resourceNames:
- weave-net
resources:
- configmaps
verbs:
- get
- update
- apiGroups:
- ''
resources:
- configmaps
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: system:weave-net
namespace: kube-system
roleRef:
kind: Role
name: system:weave-net
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: weave-net
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: weave-net
namespace: kube-system
labels:
k8s-app: weave-net
spec:
selector:
matchLabels:
k8s-app: weave-net
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
k8s-app: weave-net
spec:
containers:
- name: weave
command:
- /home/weave/launch.sh
env:
- name: WEAVE_PASSWORD
valueFrom:
secretKeyRef:
name: weave-password
key: password
- name: WEAVE_MTU
value: '8912'
- name: IPALLOC_RANGE
value: '172.16.0.0/16'
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: 'weaveworks/weave-kube:2.3.0'
livenessProbe:
httpGet:
host: 127.0.0.1
path: /status
port: 6784
initialDelaySeconds: 30
securityContext:
privileged: true
volumeMounts:
- name: weavedb
mountPath: /weavedb
- name: cni-bin
mountPath: /host/opt
- name: cni-bin2
mountPath: /host/home
- name: cni-conf
mountPath: /host/etc
- name: dbus
mountPath: /host/var/lib/dbus
- name: lib-modules
mountPath: /lib/modules
- name: xtables-lock
mountPath: /run/xtables.lock
- name: weave-npc
args: ['--metrics-addr=0.0.0.0:6781']
env:
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: 'weaveworks/weave-npc:2.3.0'
securityContext:
privileged: true
volumeMounts:
- name: xtables-lock
mountPath: /run/xtables.lock
- name: weave-tc
image: 'qmachu/weave-tc:0.0.1'
securityContext:
privileged: true
volumeMounts:
- name: xtables-lock
mountPath: /run/xtables.lock
- name: lib-tc
mountPath: /lib/tc
hostNetwork: true
hostPID: true
restartPolicy: Always
securityContext:
seLinuxOptions: {}
serviceAccountName: weave-net
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- key: CriticalAddonsOnly
operator: Exists
volumes:
- name: weavedb
hostPath:
path: /var/lib/weave
- name: cni-bin
hostPath:
path: /opt
- name: cni-bin2
hostPath:
path: /home
- name: cni-conf
hostPath:
path: /etc
- name: dbus
hostPath:
path: /var/lib/dbus
- name: lib-modules
hostPath:
path: /lib/modules
- name: xtables-lock
hostPath:
path: /run/xtables.lock
- name: lib-tc
hostPath:
path: /lib/tc
---
apiVersion: v1
kind: Secret
metadata:
name: weave-password
namespace: kube-system
type: Opaque
data:
password: {{ .weave.password }}
@Quentin-M For some reason /lib/tc does not exist on my nodes. (Debian Jessie) installed with kops using k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08
.
@thomaschaaf According to https://packages.debian.org/jessie/amd64/iproute2/filelist, you would be using /usr/lib/tc/
instead (and pareto is well in there).
@bboreham do you have any more insights on this issue? It seems like every day i come across another thread talking about dns timeouts here or there. It feels like a 'dirty little secret' at this point :)
No, no particular insight. I'm trying to cross-fertilise the conversations in the hope someone shows up and says "this is all very clear to me".
@bboreham i see, yes, that's the open source slogan right? "given enough eyes, every problem is trivial" Thanks for your continued work. Let me know if there's something I can test that would be helpful.
I'll try @Quentin-M 's fix and report back.
so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf to point at that IP requires a Kubernetes change.
You can do this already with --resolv-conf
passed to kubelet. Run a dnsmasq daemonset that proxies all dns queries to kube-dns using host networking, and listening on all interfaces. This reduces the DNS problems substantially.
As I understand it, --resolv-conf
is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.
That is what I meant by "requires a Kubernetes change" - to change the DNS server address without giving up any other features. If you don't need those features it's an option.
As I understand it, --resolv-conf is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.
If you just need to change the dns server ip you can use --cluster-dns
.
As I understand it, --resolv-conf is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.
The generated search domains and options are preserved. resolv-conf
only parses the nameservers afaik. That's how we set it up.
What DNS IP do you use that always resolves to the local host?
What DNS IP do you use that always resolves to the local host?
You can probably use the local docker bridge ip (172.17.0.1
)
Address of the docker interface. This is probably setup dependent. I think you could use any interface on the host that is routable from pods (so not the loopback).
@jsravn I would like to learn more about your setup. Do you by chance use kops?
I would like to see your dnsmasa daemon set manifest if you are willing to post it. My understanding is that kops already runs dnsmasq as a container in it's default kube-dns pod, so we would have to figure out how to disable that in a way that doesn't get undone when we use kops to update the cluster.
@dcowden You wouldn't touch the kube-dns pod, it still runs dnsmasq. The local dnsmasq caches all local queries on the node - benefits being the cache will be localised, you can bypass kube-dns completely if you want for external queries (we do this), and it's more resilient to outages. I can't give you the exact daemonset at the moment, but it shouldn't be so hard, you need to setup hostNetworking and configure dnsmasq to listen on the local docker bridge. The trickier part is configuring kubelet with -resolv-conf
, since that won't be easy in hosted solutions like GKE. In this case, it would be nice if k8s had a runtime API for configuring the DNS setup (which it doesn't afaik). You could probably do it with a custom iptables rule to intercept dns requests and transparently route to your local dnsmasq via dnat - this would be done as part of the daemonset. That is feeling pretty hacky though.
(Apologies if I've taken this issue off topic - feel free to contact me on kubernetes slack if you want to discuss further ideas)
@jsravn thanks for this tip. I hadn't thought of this approach, but it has a number of benefits-- for example, it makes it much more straightforward to work in a split-dns corporate environment.
So, as far as I can tell from this thread, there isn't really a solution yet aside from some of these workarounds is there?
Not only is there not a solution, we don't know which of the various theories about the problem is most important in practice.
@bboreham Understandable, we've been migrating to Kubernetes and haven't had really any CNI work for us. Every single one appears to either have high latency or kube-dns issues. Just a bit frustrating since clearly other people are able to make kubernetes work. Hopefully we're able to diagnose which theory is most "important" and/or what has been causing these issues.
@jaredallard I agree with your assessment. For us, using standard network doesnt work because we require encryption between nodes-- which is hard to set up on bare metal, vs weave, that 'just works'.
While technically a workaround, I believe that the dnsmasq solution provided by @jsravn is technically the right answer. In our case, we have split dns and all kinds of weird stuff. At some point, its best to simply let the bare metal layer handle it. I think there's fairly decent evicence that people's SNAT/DNAT problems are pretty much all DNS, so i think running a dnsmasq process on each node makes sense, and should probably be the 'right way', as long as you're still using CNI.
Of course as you pointed out, I agree that if you can avoid CNI, that's probably the 'right choice'-- it removes a whole layer of stuff to deal with.
@jaredallard My weave-tc work around is simple enough to use and fixes the problem for us entirely.
@Quentin-M Does it solve just latency or issues with kube-dns as well? We've pretty much gotten rid of all issues with latency on calico w/ ip-in-ip, but kubedns doesn't work when it gets a lot of hits.
This particularly solves the kernel race condition inside conntrack that drops parallel A/AAAA packets, leading to static 5s latency on each DNS query, regardless of coredns/kubedns/powerdns...
Just posted a little write-up about our journey troubleshooting this issue there: https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/, including our workaround.
@Quentin-M can it run on non-Weave network? our environment is ovs + openshift.
Can do, the network interface in the script must be set appropriately. It can’t work on a network interface where traffic is already encrypted, it has to be set above that layer (e.g. eth0 is not OK for Weave, but the weave0 interface is OK).
@Quentin-M Hi, I have the same problem as @thomaschaaf :
No distribution data for pareto (/lib/tc//pareto.dist: No such file or directory)
However, I'm using CentOS 7 and there's no iproute2
package. What should I do in this case?
Edit:
Found out it was in /usr/lib64/tc
instead of /usr/lib/tc
.
Hi,
On CentOS, pareto.dist is in /usr/lib64/tc and provided by the iproute package. The mount needs to be adapted accordingly. Ref: https://centos.pkgs.org/7/centos-x86_64/iproute-4.11.0-14.el7.x86_64.rpm.html
What happened?
We are experiencing random 5 second DNS timeouts in our kubernetes cluster.
How to reproduce it?
It is reproducible by requesting just about any in-cluster service, and observing that periodically ( in our case, 1 out of 50 or 100 times), we get a 5 second delay. It always happens in DNS lookup.
Anything else we need to know?
We believe this is a result of a kernel level SNAT race condition that is described quite well here:
https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02
The problem happens with non-weave CNI implementations, and is (ironically) not even a weave issue really. However, its becomes a weave issue, because the solution is to set a flag on the masquerading rules that are created, which are not in anyone's control except for weave.
What we need is the ability to apply the NF_NAT_RANGE_PROTO_RANDOM_FULLY flag on the masquerading rules that weave sets up. IN the above post, Flannel was in use, and the fix was there instead.
We searched for this issue, and didnt see that anyone had asked for this. We're also unaware of any settings that allow setting this flag today-- if that's possible, please let us know.