grafana / tanka

Flexible, reusable and concise configuration for Kubernetes
https://tanka.dev
Apache License 2.0
2.36k stars 167 forks source link

Cannot define a TCP port and a UDP port when both use the same port number #99

Closed beorn7 closed 4 years ago

beorn7 commented 4 years ago

See https://github.com/grafana/jsonnet-libs/pull/178 for an example. In short: Alertmanager gossiping, similar to DNS, uses both UDP and TCP (presumably the latter as a fallback for the former). In K8s (and with ksonnet), it is no problem to define two ports with different protocol and name for the same port number. The example linked above results in the following port definition with ks:

     "ports": [
       {
         "name": "alertmanager-http-metrics",
         "port": 80,
         "targetPort": 80
       }
       {
         "name": "alertmanager-gossip-udp",
         "port": 9094,
         "protocol": "UDP",
         "targetPort": 9094
       }
       {
         "name": "alertmanager-gossip-tcp",
         "port": 9094,
         "targetPort": 9094
       }
     ],

Doing the same thing with tk munges together the both alertmanager-gossip-… ports. The result is different depending on the order the two ports are defined. I assume this has to do with the way Tanka flattens the result.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.82. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

beorn7 commented 4 years ago

Actually, tk show is showing the right thing as above, just tk diff is giving me the weird result. So perhaps the problem is only in the diff'ing, and Tanka would send the right thing to K8s.

sh0rez commented 4 years ago

We have two diff strategies at the moment:native (kubectl diff) and subset for older clusters lower than v1.13. I suspect this issue to only occur in subset mode, could you check the cluster version?

beorn7 commented 4 years ago

The K8s API server is v1.13.7. That's not “lower than v1.13”. (Or did you mean "v1.13 or lower”?)

sh0rez commented 4 years ago

Odd. Version is fine. Please try forcing the diff strategy using flag to native, if that doesn’t help try piping tk show into kubectl diff ... if THAT doesn’t help either it’s a kubectl issue

beorn7 commented 4 years ago

Did all of that. Result: This is a kubectl issue!

Excerpts from tk show:

spec:
  ports:
  - name: alertmanager-http-metrics
    port: 80
    targetPort: 80
  - name: alertmanager-gossip-udp
    port: 9094
    protocol: UDP
    targetPort: 9094
  - name: alertmanager-gossip-tcp
    port: 9094
    targetPort: 9094
  selector:
    name: alertmanager
// ...
       ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9094
          name: gossip-udp
          protocol: UDP
        - containerPort: 9094
          name: gossip-tcp

Ran kubectl diff on that output:

--- /tmp/LIVE-630995268/apps.v1beta1.StatefulSet.default.alertmanager   2019-10-27 22:10:02.768325693 +0100
+++ /tmp/MERGED-224906195/apps.v1beta1.StatefulSet.default.alertmanager 2019-10-27 22:10:02.880327719 +0100
@@ -43,6 +52,9 @@
         - containerPort: 80
           name: http-metrics
           protocol: TCP
+        - containerPort: 9094
+          name: gossip-udp
+          protocol: UDP
         resources:
           requests:
             cpu: 10m
# ...
--- /tmp/LIVE-630995268/v1.Service.default.alertmanager 2019-10-27 22:09:57.544231823 +0100
+++ /tmp/MERGED-224906195/v1.Service.default.alertmanager       2019-10-27 22:09:57.656233821 +0100
@@ -19,6 +19,10 @@
     port: 80
     protocol: TCP
     targetPort: 80
+  - name: alertmanager-gossip-udp
+    port: 9094
+    protocol: UDP
+    targetPort: 9094
   selector:
     name: alertmanager
   sessionAffinity: None
exit status 1

We found a kubectl diff bug! :o)

beorn7 commented 4 years ago

Possibly related: https://github.com/kubernetes/kubernetes/issues/39188

sh0rez commented 4 years ago

/cc @malcolmholmes haven’t we already experienced that issue a while ago? Feel like we stumbled across that exact kubectl issue

sh0rez commented 4 years ago

@beorn7 looks like we need to wait for the folks over at kubectl or file a PR at theirs.

In the meantime, we might want to add this to the docs section „known errors“ to help other users. Also it might be possible that subset diff works better in that case, it should not have that issue

beorn7 commented 4 years ago

It's kind of worrying that https://github.com/kubernetes/kubernetes/issues/39188 has been open for almost 2 years…

beorn7 commented 4 years ago

Let's just document it then and then close this?

sh0rez commented 4 years ago

Yup! You wanna do that?

beorn7 commented 4 years ago

On my list…