mesosphere / kubernetes-mesos

A Kubernetes Framework for Apache Mesos
637 stars 92 forks source link

current port mapping requires service.spec.port.name #322

Open jdef opened 9 years ago

jdef commented 9 years ago

Last tested against k8s v0.17+.

When the upstream's endpoints controller build endpoints specs it makes assumptions about the sameness of the destination port of a service endpoint for service specs that only define a single port. This assumption does not hold for this project since our custom endpoints controller implementation uses host-port endpoints (allocated from resource offers) instead of pod-port endpoints.

As such, to avoid confusing the endpoints controller (and generating tons of port uniqueness constraint violations) users are strongly urged to specify a service.spec.port.name for each every service port, even for services that only expose a single port.

jdef commented 9 years ago

@sttts is this still a problem now that upstream 10390 has merged?

sttts commented 9 years ago

If I read the type ServicePort definition correctly, now there is always a TargetPort (which might default to Port or a name) in a service spec. Hence, our endpoint controller can map that to the container port of a task. This sounds like this was different in a former k8s version?

I think this is even unrelated to https://github.com/GoogleCloudPlatform/kubernetes/pull/10390. It somehow relates to https://github.com/GoogleCloudPlatform/kubernetes/pull/10395 because our endpoint controller needs explicit port definition in the pod spec in order to do Mesos port allocation.

jdef commented 9 years ago

I ran into this problem when running the guestbook examples when those services did not name ports. Multiple redis-slaves and multiple frontend instances came up just fine. But k8s complained when it tried to make sense of the endpoints -- and it only stopped complaining when I named the service ports.

On Wed, Aug 12, 2015 at 1:16 PM, Dr. Stefan Schimanski < notifications@github.com> wrote:

If I read the type ServicePort definition correctly, now there is always a TargetPort (which might default to Port or a name) in a service spec. Hence, our endpoint controller can map that to the container port of a task. This sounds like this was different in a former k8s version?

I think this is even unrelated to GoogleCloudPlatform/kubernetes#10390 https://github.com/GoogleCloudPlatform/kubernetes/pull/10390. It somehow relates to GoogleCloudPlatform/kubernetes#10395 https://github.com/GoogleCloudPlatform/kubernetes/pull/10395 because our endpoint controller needs explicit port definition in the pod spec in order to do Mesos port allocation.

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/kubernetes-mesos/issues/322#issuecomment-130378748 .

sttts commented 9 years ago

In fact you are writing about the actual name of the service port, not the targetPort (which can hold a port name).

I don't see in the endpoint controller code how this value influences whether a port can be found, i.e. that name is not even compared to anything in the pod/task. It's only used as name of the endpoint to be found by the respective service.

In addition I did some tests with service ports, with and without name, only one service port and multiple. Seems to work fine.

jdef commented 9 years ago

Did you test an unnamed service port that's backed by a replication controller running multiple instances of a pod on the same slave?

On Wed, Aug 12, 2015 at 6:08 PM, Dr. Stefan Schimanski < notifications@github.com> wrote:

In fact you are writing about the actual name of the service port, not the targetPort (which can hold a port name).

I don't see in the endpoint controller code how this value influences whether a port can be found, i.e. that name is not even compared to anything in the pod/task. It's only used as name of the endpoint to be found by the respective service.

In addition I did some tests with service ports, with and without name, only one service port and multiple. Seems to work fine.

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/kubernetes-mesos/issues/322#issuecomment-130461892 .

sttts commented 9 years ago

Yes, have started the guestbook with the redis-slave-controller set to 3 instances on 2 nodes. All looks fine, 3 endpoints come up, nothing special in the endpoint controller log.

Which wrong behavior do you expect?

jdef commented 9 years ago

the endpoint controller used to generate lots of warnings about port uniqueness constraint violations.

sttts commented 9 years ago

Maybe it was also an upstream bug. Endpoint controller log is clean. I tend to close this issue until we notice anything strange again in this direction.

jdef commented 9 years ago

if it's no longer an issue then we should update the "known issues" list accordingly

karlkfi commented 9 years ago

My understanding is that pod container ports still need names, but service ports do not.

sttts commented 9 years ago

Don't think so. The guestbook has no port names for the containers.

sttts commented 9 years ago

The on-call merged faster than expected. Will close this one as well. IMO it looks like the issue is gone in master. If you find a way to trigger this again, feel free to reopen.

jdef commented 9 years ago

pretty sure that this is still a problem/known-issue for k8s-mesos:

E0817 19:14:51.689768     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[3].ports[2].name: required value, subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[3].ports[2].name: required value]
E0817 19:14:51.789487     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value]
E0817 19:14:51.891507     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[3].ports[2].name: required value, subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[3].ports[2].name: required value]
E0817 19:14:52.039477     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value]
E0817 19:14:52.139668     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[0].ports[0].name: required value, subsets[0].ports[1].name: required value, subsets[0].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[0].ports[0].name: required value, subsets[0].ports[1].name: required value, subsets[0].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value]
E0817 19:14:52.239867     378 endpoints_controller.go:369] Error updating endpoints: Endpoints "frontend" is invalid: [subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value, subsets[2].ports[0].name: required value, subsets[2].ports[1].name: required value, subsets[2].ports[2].name: required value, subsets[3].ports[0].name: required value, subsets[3].ports[1].name: required value]

from pkg/api/validation/validation.go:

func validateEndpointSubsets(subsets []api.EndpointSubset) errs.ValidationErrorList {
        allErrs := errs.ValidationErrorList{}

        for i := range subsets {
                ss := &subsets[i]

                ssErrs := errs.ValidationErrorList{}

                if len(ss.Addresses) == 0 {
                        ssErrs = append(ssErrs, errs.NewFieldRequired("addresses"))
                }
                if len(ss.Ports) == 0 {
                        ssErrs = append(ssErrs, errs.NewFieldRequired("ports"))
                }
                for addr := range ss.Addresses {
                        ssErrs = append(ssErrs, validateEndpointAddress(&ss.Addresses[addr]).PrefixIndex(addr).Prefix("addresses")...)
                }
                for port := range ss.Ports {
                        ssErrs = append(ssErrs, validateEndpointPort(&ss.Ports[port], len(ss.Ports) > 1).PrefixIndex(port).Prefix("ports")...)
                }

                allErrs = append(allErrs, ssErrs.PrefixIndex(i)...)
        }

        return allErrs
}
jdef commented 9 years ago

and...

func validateEndpointPort(port *api.EndpointPort, requireName bool) errs.ValidationErrorList {
        allErrs := errs.ValidationErrorList{}
        if requireName && port.Name == "" {
                allErrs = append(allErrs, errs.NewFieldRequired("name"))
sttts commented 9 years ago

We add annotation to the endpoints. It might be that the necessary name can be replaced by them.

jdef commented 9 years ago

replacing name with something that we cook up is probably going to break kube-proxy

sttts commented 9 years ago

Still wondering why the error happens only with the mesos endpoint controller. The name constraint should always apply.

sttts commented 9 years ago

I think I know: with non-Mesos setups the ports are all equal => len(ss.Ports)>1 is never true. For Mesos it is.

jdef commented 9 years ago

correct

sttts commented 9 years ago

sttts [21:43] I guess it would be enough if validateEndpointSubsets would check for len(ss.Ports)>1 && len(service.Ports)>1

sttts [21:47] but of course we don’t know the service at that position

sttts [21:50] @jdef: what about using distinct endpoint subsets for each port of a service?

sttts [21:53] wondering what happens if you have two services without service port names which apply to a number of pods. should be the same issue.

sttts [21:55] I think that’s the point: different services lead to different endpoint subsets

jdef commented 9 years ago

this is the commit that introduced it: https://github.com/kubernetes/kubernetes/commit/c20efb6bae5e7e5ca685d7d1475e73e14e80ac6e

it looks like the unit test is checking for a mix of endpoint ports, some of which have names and others that don't (see Multiple ports, one without name). I think I see where the author was going: don't allow a mix of named and unnamed things.

I think the resulting validation implementation is too aggressive. Instead perhaps it could take care to validate the following cases:

  1. none of the endpoint ports in the subset are named
  2. all of the endpoint ports in the subset are named (the same)

Thoughts?