mesosphere / marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
https://mesosphere.github.io/marathon/
Apache License 2.0
4.07k stars 845 forks source link

Cannot reference a port of a resident application to define a readiness check #5137

Closed hantuzun closed 7 years ago

hantuzun commented 7 years ago

I can't set "readinessCheck" of an application of mine. I'm following the readinessCheck documentation while working with Marathon version 1.1.2-dcos-1 and DCOS v. 1.7.3.


Here's an example response I get while trying to set "readinessChecks":

Request
$ curl -XPUT http://marathon.mesos:8080/v2/apps/elasticsearch --header "Content-type: application/json" --data '
{
    "readinessChecks": [
      {
        "path": "/"
      }
    ]
}'
Response
{"message":"Object is not valid","details":[{"path":"/value(0)/portName","errors":["is not one of ()"]}]}

The same error is persisted however many combinations I've tried. The same error is returned on Marathon .UI as well.


How can I set readinessCheck?

jdef commented 7 years ago

Please provide additional context. For example, if you're updating an existing app definition, what does that definition look like?

On Wed, Feb 8, 2017 at 9:11 AM, Han Tuzun notifications@github.com wrote:

I can't set "readinessCheck" of an application of mine. I'm following the readinessCheck documentation https://mesosphere.github.io/marathon/docs/health-checks.html while working with Marathon version 1.1.2-dcos-1 and DCOS v. 1.6, which uses Mesos v. 0.27.

Here's an example response I get while trying to set "readinessChecks": Request

$ curl -XPUT http://marathon.mesos:8080/v2/apps/elasticsearch --header "Content-type: application/json" --data '{ "readinessChecks": [ { "path": "/" } ]}'

Response

{"message":"Object is not valid","details":[{"path":"/value(0)/portName","errors":["is not one of ()"]}]}

The same error is persisted however many combinations I've tried. The same error is returned on Marathon .UI as well.

How can I set readinessCheck?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesosphere/marathon/issues/5137, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLPCVXfAy_hyaHuqOQV8a-CykS3PLks5rac0qgaJpZM4L63WJ .

hantuzun commented 7 years ago

Hi @jdef,

We want to add readiness check for a resident task of Elasticsearch.

Our issue is turned out to be about setting portDefinitions. Readiness

Health checks API on ports is like this:

For MESOS_HTTP, MESOS_HTTPS, MESOS_TCP, TCP, HTTP and HTTPS health checks, either port or portIndex may be used. If none is provided, portIndex is assumed. If port is provided, it takes precedence overriding any portIndex option.

  • portIndex (Optional. Default: 0): Index in this app’s ports or portDefinitions array to be used for health requests. An index is used so the app can use random ports, like [0, 0, 0] for example, and tasks could be started with port environment variables like $PORT1.
  • port (Optional. Default: None): Port number to be used for health requests. https://mesosphere.github.io/marathon/docs/health-checks.html

Readiness check API on ports is just the below:

Our Elasticsearch task have the following portDefinitions now:

"portDefinitions":[
   {
      "port":10104,
      "protocol":"tcp",
      "labels":{

      }
   },
   {
      "port":10105,
      "protocol":"tcp",
      "labels":{

      }
   }
]

Marathon is generated them and we're interested in the first one of them (port 10104). The issue is we cannot update them since our task is resident. We cannot add these ports names neither. Note that they don't have default names as well.

Therefore, we're only left with the option of having Elasticsearch downtime for about 5 minutes where we will add http-api name to the first port definitions. Could we have another solution?

This could have been prevented by having the health check port API for readiness checks as well. Having a mechanism for updating port definitions of resident tasks could also have helped.

aquamatthias commented 7 years ago

@hantuzun Marathon needs to know on which port to run the readiness check. Therefore you need to define a name in portDefinition and reference this port in the readiness check definition - eg:

{
  "portDefinitions": [
    {
      "port": 0,
      "protocol": "tcp",
      "name": "myport"
    }
  ],
  "readinessChecks": [
    {
      "path": "/v1/plan",
      "portName": "myport"
    }
  ]
}
jdef commented 7 years ago

please don't use mixed case port names. port names for readiness checks (according to the app schema) should be all lowercase

On Thu, Feb 9, 2017 at 7:05 AM, Matthias Veit notifications@github.com wrote:

@hantuzun https://github.com/hantuzun Marathon needs to know on which port to run the readiness check. Therefore you need to define a name in portDefinition and reference this port in the readiness check definition - eg:

{ "portDefinitions": [ { "port": 0, "protocol": "tcp", "name": "myHttpPort" } ], "readinessChecks": [ { "path": "/v1/plan", "portName": "myHttpPort" } ] }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mesosphere/marathon/issues/5137#issuecomment-278623269, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLLAtf7Runct1U3sEjg0m23gdDqZlks5rawD0gaJpZM4L63WJ .

aquamatthias commented 7 years ago

@jdef added the definition - thanks for the hint.

hantuzun commented 7 years ago

Hi @aquamatthias ,

Thanks for the response. Have you read the second post of mine on this thread? I didn't say there's no need to provide a port for readiness checks.

What I am saying is:

Is the issue clear? I'm kindly asking the members to reopen the issue.

jdef commented 7 years ago

Sounds like you want to add a name to a portDefinition of a running Marathon app that uses resident resources. But Marathon is rejecting the update because it sees that there is a change to the port-definition that is "incompatible" with the original app spec. You don't want to destroy the resident app, you simply want to update it: decorate an existing portDefinition declaration with a name, and then add a readiness check that references that portDefinition.

Does that sum it up?

On Fri, Feb 10, 2017 at 1:47 PM, Han Tuzun notifications@github.com wrote:

Hi @aquamatthias https://github.com/aquamatthias ,

Thanks for the response. Have you read the second post of mine on this thread? I didn't say there's no need to provide a port for readiness checks.

What I am saying is:

  • We can't just modify the port definitions of a resident task. We need to stop it.
  • These port definitions Marathon created does not come up with a default name. Therefore, I can't refer to them.
  • The readiness check API does not accept portDefinitions index like the healthcheck API. Therefore, I can't refer to the first one of these ports defined.
  • The readiness check API does not accept a port number just like the healthcheck API. Therefore, I can't just write the readiness check with the port.

Is the issue clear? I'm kindly asking the members to reopen the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mesosphere/marathon/issues/5137#issuecomment-279030835, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLFxjmXVXBXyXOFyAwKFfi8m_zKJgks5rbLCqgaJpZM4L63WJ .

hantuzun commented 7 years ago

Yes @jdef, that's all right!

The solution you propose would have solve our problem :)

i just want to point out the case we're having right now. The readiness API seems to be not looked after for a while.

We'll have some down time anyway since we use DC/OS and it'd take too long to have an update to fix this. We could have updated our Marathon version with a hot-fix or we could have created a second Elasticsearch for this transition, but no need for our case.

However, not being able to add readiness checks to resident tasks could cause more trouble for others.

aquamatthias commented 7 years ago

@hantuzun I see. The problem arises with resident tasks since it is not allowed to change resources after they are created. You actually do not want to change resources, but only assign a name.

There are 2 things to improve:

I will change the title of this ticket to reflect this more clearly.

aquamatthias commented 7 years ago

Patch is here: https://phabricator.mesosphere.com/D511

jdef commented 7 years ago

There's a fly in the ointment w/ respect to adding/changing a port name: while it doesn't change the resources allocated for the task, it does change the discovery-info generated for the task. Mesos doesn't allow dynamic updates for task discovery info. If Marathon allowed such a change on it's end, without restarting the task, then Marathon would be able to use the port name for readiness checks, but the Mesos task wouldn't have the port name in it's discovery info. We should actively avoid this type of dissonance.

As a compromise, we could allow the port name addition/change (because it's not a resource change) for resident tasks, but we'd also have to restart the task on the cluster so that the discovery information was properly advertised.

On Mon, Feb 13, 2017 at 4:02 AM, Matthias Veit notifications@github.com wrote:

Reopened #5137 https://github.com/mesosphere/marathon/issues/5137.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mesosphere/marathon/issues/5137#event-959217063, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLNDSq7xk8nA2ohcWTdXQLmRzLZm-ks5rcBwvgaJpZM4L63WJ .

aquamatthias commented 7 years ago

@jdef D511 refines the validation for not changing resources. Adding/Changing/Removing a name to the port definition results in a restart of the task.

hantuzun commented 7 years ago

Thanks a lot @aquamatthias! That's a helpful fix.