hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 386 forks source link

UDP port 8301 does not work with `client.exposeGossipPort` set to true #389

Closed dschaaff closed 4 years ago

dschaaff commented 4 years ago

My consul servers run on ec2 outside of my kubernetes cluster. I am using the helm chart to deploy only the consul client daemonset.

In my values file

client:
  enabled: true
  image: null
  join: null

  # grpc should be set to true if the gRPC listener should be enabled.
  # This should be set to true if connectInject is enabled.
  grpc: true
  exposeGossipPorts: true

This sets the following in the ports for the daemonset

- containerPort: 8301
          hostPort: 8301
          name: serflan-tcp
          protocol: TCP
        - containerPort: 8301
          hostPort: 8301
          name: serflan-udp
          protocol: UDP

However, only tcp traffic is allowed (confirmed via netcat). If edit the spec to set hostNetwork: true then udp works as expected.

I'm not sure if this is a consul issue or a kubernetes issue. I'm running kubernetes 1.15 on AWS EKS with version 1.5.5 of the vpc cni plugin. I'm happy to provider more information if its useful.

ishustava commented 4 years ago

Hey @dschaaff, thanks for creating this issue!

Is this something you're seeing on a new installation or on an upgrade?

My initial thought is that it most likely depends on the specific CNI implementation. Although, the AWS VPC CNI just uses the portmap plugin aws/amazon-vpc-cni-k8s#153.

There is a known issue around upgrading aws/amazon-vpc-cni-k8s#373, that's why I'm curious if this is something that happens on a clean install.

dschaaff commented 4 years ago

The issue occurs with both a clean install (brand new cluster and helm deployment) and an upgrade of an existing deployment.

I have also seen issues with upgrading that look like the vpc cni issue you linked, but that is unrelated to what is happening here. When that issue occurs it only affects a subset of nodes and I rotate them out.

dschaaff commented 4 years ago

I may switch to running with hostNetwork: true for now as a workaround, but it'd be nice to avoid forking the chart to make the change.

Would you be open to a pull request that makes an optional config item?

ishustava commented 4 years ago

Hey @dschaaff, sorry for the delay.

I'm having trouble reproducing this issue. Here is the list of things I've done:

  1. Created an EKS cluster:
    Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.10-eks-bac369", GitCommit:"bac3690554985327ae4d13e42169e8b1c2f37226", GitTreeState:"clean", BuildDate:"2020-02-26T01:12:54Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  2. Created an EC2 instance with a security group that allows all TCP and UDP traffic from EKS nodes. Started running consul server on it with the following command consul agent -server -bootstrap-expect=1 -bootstrap -data-dir=./consul-data
  3. Updated EKS node security group to accept all TCP and UDP traffic from the security group I've just created.
  4. Installed the Consul helm chart (v0.18.0), setting client.exposeGossipPorts to true, server.enabled to false, and client.join to the private IP of my EC2 instance. Everything else in the values.yaml file was left as default. Here is my Helm config.yaml file:
    server:
    enabled: false
    replicas: 1
    bootstrapExpect: 1
    client:
    enabled: true
    exposeGossipPorts: true
    join: ["192.168.90.156"]

    After the install, everything looked healthy. The client agents on EKS are able to join my EC2 server instance. I have run helm test <my-release-name> to confirm that the basic functionality is working.

  5. I've confirmed that I can reach port 8301 on one of the EKS nodes from my EC2 instance using both TCP and UDP:
    [ec2-user@ip-192-168-90-156 ~]$ nc -vz 192.168.26.161 8301
    Ncat: Version 7.50 ( https://nmap.org/ncat )
    Ncat: Connected to 192.168.26.161:8301.
    Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
    [ec2-user@ip-192-168-90-156 ~]$ nc -vz -u 192.168.26.161 8301
    Ncat: Version 7.50 ( https://nmap.org/ncat )
    Ncat: Connected to 192.168.26.161:8301.
    Ncat: UDP packet sent successfully
    Ncat: 1 bytes sent, 0 bytes received in 2.01 seconds.

Let me know if I'm missing something.

I saw your PR, and thank you for making a contribution 🙏 I'd like to understand the problem first if at all possible.

dschaaff commented 4 years ago

Let pull in some more info on the cluster and the setup. I have 3 separate eks clusters that all exhibit this behavior with hostNetwork: true. I'll drill in on one for troubleshooting.

Server Setup

I have 3 consul servers running directly on ec2 outside of the kubernetes cluster.

Here is the config file

{
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enabled": true,
        "token_ttl": "30s",
        "tokens": {
            "agent": "redacted",
            "default": "redacted",
            "master": "redacted",
            "replication": "redacted"
        }
    },
    "addresses": {
        "dns": "0.0.0.0",
        "grpc": "0.0.0.0",
        "http": "0.0.0.0",
        "https": "0.0.0.0"
    },
    "advertise_addr": "10.20.202.203",
    "advertise_addr_wan": "10.20.202.203",
    "autopilot": {
        "cleanup_dead_servers": false,
        "last_contact_threshold": "200ms",
        "max_trailing_logs": 250,
        "server_stabilization_time": "10s"
    },
    "bind_addr": "10.20.202.203",
    "bootstrap": false,
    "bootstrap_expect": 3,
    "ca_file": "/etc/consul/ssl/ca.crt",
    "cert_file": "/etc/consul/ssl/server.crt",
    "client_addr": "0.0.0.0",
    "data_dir": "/var/consul",
    "datacenter": "stg-us-west-2",
    "disable_update_check": false,
    "domain": "consul",
    "enable_local_script_checks": false,
    "enable_script_checks": false,
    "encrypt": "redacted",
    "key_file": "/etc/consul/ssl/server.key",
    "log_file": "/var/log/consul/consul.log",
    "log_level": "INFO",
    "log_rotate_bytes": 0,
    "log_rotate_duration": "24h",
    "log_rotate_max_files": 0,
    "node_name": "ip-10-20-202-203.us-west-2.compute.internal",
    "performance": {
        "leave_drain_time": "5s",
        "raft_multiplier": 1,
        "rpc_hold_timeout": "7s"
    },
    "ports": {
        "dns": 8600,
        "grpc": 8502,
        "http": 8500,
        "https": 8501,
        "serf_lan": 8301,
        "serf_wan": 8302,
        "server": 8300
    },
    "primary_datacenter": "stg-us-west-2",
    "raft_protocol": 3,
    "retry_interval": "30s",
    "retry_interval_wan": "30s",
    "retry_join": [
        "provider=aws tag_key=consul-datacenter tag_value=stg-us-west-2"
    ],
    "retry_max": 0,
    "retry_max_wan": 0,
    "server": true,
    "tls_min_version": "tls12",
    "tls_prefer_server_cipher_suites": false,
    "translate_wan_addrs": false,
    "ui": true,
    "verify_incoming": true,
    "verify_outgoing": true,
    "verify_server_hostname": true
}
agent:
    check_monitors = 0
    check_ttls = 0
    checks = 2
    services = 2
build:
    prerelease =
    revision = 9ea1a204
    version = 1.7.2
consul:
    acl = enabled
    bootstrap = false
    known_datacenters = 1
    leader = false
    leader_addr = 10.20.208.125:8300
    server = true
raft:
    applied_index = 9465976
    commit_index = 9465976
    fsm_pending = 0
    last_contact = 28.911239ms
    last_log_index = 9465976
    last_log_term = 30
    last_snapshot_index = 9460381
    last_snapshot_term = 30
    latest_configuration = [{Suffrage:Voter ID:a26ff7d8-307a-d4b6-1e5e-db0f3c32a2c6 Address:10.20.208.125:8300} {Suffrage:Voter ID:ccf4d7e4-d254-c6ad-77c5-da8851c19117 Address:10.20.211.8:8300} {Suffrage:Voter ID:c290e06e-46ec-8342-c204-ed06287d8f9c Address:10.20.202.203:8300}]
    latest_configuration_index = 0
    num_peers = 2
    protocol_version = 3
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Follower
    term = 30
runtime:
    arch = amd64
    cpu_count = 2
    goroutines = 281
    max_procs = 2
    os = linux
    version = go1.13.7
serf_lan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 25
    failed = 1
    health_score = 0
    intent_queue = 0
    left = 16
    member_time = 457229
    members = 47
    query_queue = 0
    query_time = 1
serf_wan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 1
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 22
    members = 3
    query_queue = 0
    query_time = 1

Client Setup

Here is the content of my values file for the helm chart.

fullnameOverride: consul
# Available parameters and their default values for the Consul chart.

global:
  # enabled is the master enabled switch. Setting this to true or false
  # will enable or disable all the components within this chart by default.
  # Each component can be overridden using the component-specific "enabled"
  # value.
  enabled: false

  # Domain to register the Consul DNS server to listen for.
  domain: consul

  # Image is the name (and tag) of the Consul Docker image for clients and
  # servers below. This can be overridden per component.
  #
  # Examples:
  #   image: "consul:1.5.0"
  #   image: "hashicorp/consul-enterprise:1.5.0-ent"   # Enterprise Consul image
  image: "consul:1.7.2"

  # imageK8S is the name (and tag) of the consul-k8s Docker image that
  # is used for functionality such as the catalog sync. This can be overridden
  # per component below.
  # Note: support for the catalog sync's liveness and readiness probes was added
  # to consul-k8s v0.6.0. If using an older consul-k8s version, you may need to
  # remove these checks to make the sync work.
  imageK8S: "hashicorp/consul-k8s:0.12.0"

  # Datacenter is the name of the datacenter that the agents should register
  # as. This shouldn't be changed once the Consul cluster is up and running
  # since Consul doesn't support an automatic way to change this value
  # currently: https://github.com/hashicorp/consul/issues/1858
  datacenter: stg-us-west-2

  # enablePodSecurityPolicies is a boolean flag that controls whether pod
  # security policies are created for the consul components created by this
  # chart. See https://kubernetes.io/docs/concepts/policy/pod-security-policy/
  enablePodSecurityPolicies: false

  # Gossip encryption key. To enable gossip encryption, provide the name of
  # a Kubernetes secret that contains a gossip key. You can create a gossip
  # key with the "consul keygen" command. 
  # See https://www.consul.io/docs/commands/keygen.html 
  gossipEncryption:
    secretName: consul-secrets
    secretKey: gossip-encryption-key

  # bootstrapACLs will automatically create and assign ACL tokens within 
  # the Consul cluster. This currently requires enabling both servers and
  # clients within Kubernetes. Additionally requires Consul v1.4+ and
  # consul-k8s v0.8.0+.
  bootstrapACLs: false

# Server, when enabled, configures a server cluster to run. This should
# be disabled if you plan on connecting to a Consul cluster external to
# the Kube cluster.
server:
  enabled: false

# Client, when enabled, configures Consul clients to run on every node
# within the Kube cluster. The current deployment model follows a traditional
# DC where a single agent is deployed per node.
client:
  enabled: true
  image: null
  join: null

  # grpc should be set to true if the gRPC listener should be enabled.
  # This should be set to true if connectInject is enabled.
  grpc: true
  exposeGossipPorts: true
  # enable host network mode see https://github.com/hashicorp/consul-helm/pull/392
  enableHostNetworkMode: true
  # Resource requests, limits, etc. for the client cluster placement. This
  # should map directly to the value of the resources field for a PodSpec,
  # formatted as a multi-line string. By default no direct resource request
  # is made.
  resources:  |
    requests:
      memory: "256Mi"
      cpu: "200m"
    limits:
      memory: "256Mi"

  # extraConfig is a raw string of extra configuration to set with the
  # server. This should be JSON.
  extraConfig: |
    {
      "verify_incoming": true,
      "verify_outgoing": true,
      "verify_server_hostname": true,
      "ca_file": "/consul/userconfig/consul-secrets/ca.crt",
      "cert_file": "/consul/userconfig/consul-secrets/client.pem",
      "key_file": "/consul/userconfig/consul-secrets/client-key.pem",
      "ports": {
        "http": 8500,
        "https": 8501,
        "server": 8300
      },
      "retry_join": [
        "provider=aws tag_key=consul-datacenter tag_value=stg-us-west-2"
      ],
      "telemetry": {
        "disable_hostname": true,
        "prometheus_retention_time": "6h"
      }
    }

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Consul in the path `/consul/userconfig/<name>/`. The value below is
  # an array of objects, examples are shown below.
  extraVolumes:
    - type: secret
      name: consul-secrets
      load: false
    - type: secret
      name: consul-acl-config
      load: true # if true, will add to `-config-dir` to load by Consul

  # Toleration Settings for Client pods
  # This should be a multi-line string matching the Toleration array
  # in a PodSpec.
  # The example below will allow Client pods to run on every node
  # regardless of taints
  # tolerations: |
  #   - operator: "Exists"
  tolerations: ""

  # nodeSelector labels for client pod assignment, formatted as a muli-line string.
  # ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  # Example:
  # nodeSelector: |
  #   beta.kubernetes.io/arch: amd64
  nodeSelector: null

  # used to assign priority to client pods
  # ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
  priorityClassName: ""

  # Extra annotations to attach to the client pods
  # This should be a multi-line string mapping directly to the a map of
  # the annotations to apply to the client pods
  annotations: null

  # extraEnvVars is a list of extra enviroment variables to set with the pod. These could be
  # used to include proxy settings required for cloud auto-join feature, 
  # in case kubernetes cluster is behind egress http proxies. Additionally, it could be used to configure
  # custom consul parameters.
  extraEnvironmentVars:
    CONSUL_CACERT: /consul/userconfig/consul-secrets/ca.crt
    CONSUL_HTTP_TOKEN_FILE: /consul/userconfig/consul-secrets/consul.token
    CONSUL_CLIENT_CERT: /consul/userconfig/consul-secrets/client.pem
    CONSUL_CLIENT_KEY: /consul/userconfig/consul-secrets/client-key.pem
    # http_proxy: http://localhost:3128,
    # https_proxy: http://localhost:3128,
    # no_proxy: internal.domain.com

# Configuration for DNS configuration within the Kubernetes cluster.
# This creates a service that routes to all agents (client or server)
# for serving DNS requests. This DOES NOT automatically configure kube-dns
# today, so you must still manually configure a `stubDomain` with kube-dns
# for this to have any effect:
# https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers
dns:
  enabled: true

ui:
  # True if you want to enable the Consul UI. The UI will run only
  # on the server nodes. This makes UI access via the service below (if
  # enabled) predictable rather than "any node" if you're running Consul
  # clients as well.
  enabled: false

# syncCatalog will run the catalog sync process to sync K8S with Consul
# services. This can run bidirectional (default) or unidirectionally (Consul
# to K8S or K8S to Consul only).
#
# This process assumes that a Consul agent is available on the host IP.
# This is done automatically if clients are enabled. If clients are not
# enabled then set the node selection so that it chooses a node with a
# Consul agent.
syncCatalog:
  # True if you want to enable the catalog sync. "-" for default.
  enabled: true
  image: null
  default: true # true will sync by default, otherwise requires annotation

  # toConsul and toK8S control whether syncing is enabled to Consul or K8S
  # as a destination. If both of these are disabled, the sync will do nothing.
  toConsul: true
  toK8S: true

  # k8sPrefix is the service prefix to prepend to services before registering
  # with Kubernetes. For example "consul-" will register all services
  # prepended with "consul-". (Consul -> Kubernetes sync)
  k8sPrefix: null

  # consulPrefix is the service prefix which preprends itself
  # to Kubernetes services registered within Consul
  # For example, "k8s-" will register all services peprended with "k8s-".
  # (Kubernetes -> Consul sync)
  consulPrefix: null

  # k8sTag is an optional tag that is applied to all of the Kubernetes services
  # that are synced into Consul. If nothing is set, defaults to "k8s".
  # (Kubernetes -> Consul sync)
  k8sTag: null

  # syncClusterIPServices syncs services of the ClusterIP type, which may
  # or may not be broadly accessible depending on your Kubernetes cluster.
  # Set this to false to skip syncing ClusterIP services.
  syncClusterIPServices: true

  # nodePortSyncType configures the type of syncing that happens for NodePort
  # services. The valid options are: ExternalOnly, InternalOnly, ExternalFirst.
  # - ExternalOnly will only use a node's ExternalIP address for the sync
  # - InternalOnly use's the node's InternalIP address
  # - ExternalFirst will preferentially use the node's ExternalIP address, but
  #   if it doesn't exist, it will use the node's InternalIP address instead.
  nodePortSyncType: ExternalFirst

  # aclSyncToken refers to a Kubernetes secret that you have created that contains
  # an ACL token for your Consul cluster which allows the sync process the correct
  # permissions. This is only needed if ACLs are enabled on the Consul cluster.
  aclSyncToken:
    secretName: consul-secrets
    secretKey: consul-k8s-sync.token

  # nodeSelector labels for syncCatalog pod assignment, formatted as a muli-line string.
  # ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  # Example:
  # nodeSelector: |
  #   beta.kubernetes.io/arch: amd64

# ConnectInject will enable the automatic Connect sidecar injector.
connectInject:
  enabled: false

  # Requires Consul v1.5+ and consul-k8s v0.8.1+
  centralConfig:
    enabled: false

Security Groups

I have consul security group that is added to all nodes participating in the consul cluster. In this case that is both the servers and eks nodes. I have confirmed network communication is open as expected.

{
    "SecurityGroups": [
        {
            "Description": "tf: stg consul client security group",
            "GroupName": "stg-consul-client-sg",
            "IpPermissions": [
                {
                    "FromPort": 8500,
                    "IpProtocol": "tcp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8502,
                    "UserIdGroupPairs": [
                        {
                            "Description": "eks",
                            "GroupId": "sg-01197f8e4ab4e793d",
                            "UserId": "00000000000"
                        },
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8300,
                    "IpProtocol": "tcp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8300,
                    "UserIdGroupPairs": [
                        {
                            "Description": "eks",
                            "GroupId": "sg-01197f8e4ab4e793d",
                            "UserId": "00000000000"
                        },
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8301,
                    "IpProtocol": "udp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8302,
                    "UserIdGroupPairs": [
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8600,
                    "IpProtocol": "udp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8600,
                    "UserIdGroupPairs": [
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8301,
                    "IpProtocol": "tcp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8302,
                    "UserIdGroupPairs": [
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8300,
                    "IpProtocol": "tcp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8302,
                    "UserIdGroupPairs": [
                        {
                            "Description": "eks",
                            "GroupId": "sg-01197f8e4ab4e793d",
                            "UserId": "00000000000"
                        }
                    ]
                },
                {
                    "FromPort": 8600,
                    "IpProtocol": "tcp",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "ToPort": 8600,
                    "UserIdGroupPairs": [
                        {
                            "GroupId": "sg-056f1f0147a614203",
                            "UserId": "00000000000"
                        }
                    ]
                }
            ],
            "OwnerId": "00000000000",
            "GroupId": "sg-056f1f0147a614203",
            "IpPermissionsEgress": [],
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "stg-consul-sg"
                },
                {
                    "Key": "environment",
                    "Value": "stg"
                },
                {
                    "Key": "src",
                    "Value": "terraform"
                },
                {
                    "Key": "terraform",
                    "Value": "true"
                },
                {
                    "Key": "TFManaged",
                    "Value": "true"
                }
            ],
            "VpcId": "vpc-00000000000"
        }
    ]
}

Troubleshooting

I just updated the daemonset config to remove hostNetwork: true in this cluster. As soon as the updated pods rollout these messages resurface.

Mar 20 17:03:47 ip-10-20-208-125 consul[7946]:     2020-03-20T17:03:47.226Z [WARN]  agent.server.memberlist.lan: memberlist: Was able to connect to ip-10-20-47-144.us-west-2.compute.internal but other probes failed, network may be misconfigured

The logs show these for each eks host.

This is interesting because netcat shows no issues connecting from the server to the client.

nc -vvz ip-10-20-23-225.us-west-2.compute.internal 8301
Connection to ip-10-20-23-225.us-west-2.compute.internal 8301 port [tcp/*] succeeded!
root@ip-10-20-208-125:/home/danielschaaff# nc -vvz -u ip-10-20-23-225.us-west-2.compute.internal 8301
Connection to ip-10-20-23-225.us-west-2.compute.internal 8301 port [udp/*] succeeded!

If I then switch hostNetwork: true back on for the daemonset, the error message listed above go away once the pods update.

I'm happy to collect any additional information that would be helpful.

ishustava commented 4 years ago

Thanks for all this detailed info @dschaaff! That's super helpful.

To confirm, are you experiencing any errors? Does your sync process sync services from Kube? In other words, other than those warning messages, is the Consul cluster operational?

I can dig into the warning messages. It looks like I'm seeing them after a while on my cluster too.

dschaaff commented 4 years ago

As far as I have been able to tell the cluster functions while in that state. The sync service is successfully registering services with consul and I have a number of containers pulling config items from the consul k/v store through the local agent without issue. If use consul monitor -log-ldev=trace I get an additional log line

2020-03-20T17:45:16.726Z [DEBUG] agent.server.memberlist.lan: memberlist: Failed ping: ip-10-20-59-47.us-west-2.compute.internal (timeout reached)
2020-03-20T17:45:17.226Z [WARN]  agent.server.memberlist.lan: memberlist: Was able to connect to ip-10-20-59-47.us-west-2.compute.internal but other probes failed, network may be misconfigured
dschaaff commented 4 years ago

I just wanted to drop in and see if there any updates on this or the corresponding PR? Thanks so much!

dschaaff commented 4 years ago

Quick update. I feel like a bit of a ding dong for not thinking about this prior, but pod IPs are directly routable when using the vpc cni in EKS. This means I don't need to use exposeGossipPorts or hostNetwork mode. I've updated the client daemonset with those settings and am not seeing ping errors that I was those settings enabled. I think all that is really needed is a note in the docs if anything.

ishustava commented 4 years ago

Hey @dschaaff, thanks for the update!

Yes, if allowing traffic between the pod Network and the consul server network is an option, it's definitely a better solution. The exposeGossipPorts is more of an option for folks that can't or don't want to use the VPC CNI.

I'll take a look at this behavior again today. It's definitely strange, and I'm curious if this is something specific to AWS or generic to other clouds too.

ishustava commented 4 years ago

@dschaaff,

After looking a bit more into this, I'm fairly certain this problem is related to the portmap CNI plugin, which is the plugin that various CNIs, including the VPC CNI, use for port mapping when you're using hostPort. There are a few issues that are reporting problems with UDP connections and hostPort. For example, containernetworking/plugins#123. HostNetwork doesn't use CNI, and that's probably why you're not seeing these warnings when setting hostNetwork to true.

Given that you're using the pod network through VPC CNI, do you feel like #392 is still necessary?

dschaaff commented 4 years ago

It’s not necessary for my particular use, no. I’m not sure if it’s been requested before or not, but I’m good if you’d like to close it.

Thanks for all the help!

ishustava commented 4 years ago

Ok, I'll close both this issue and the PR for now, but definitely let us know if this comes up again.

Thanks for staying engaged on this issue 😀 💯