Open cdennison opened 6 years ago
Not sure there is a generic solution here - The underlying assumption that docker is the container runtime may not be always true. ExecSyncRequest
will work with any kube compliant CRI (cri-o, rkt,...).
Docker in docker sounds like a good solution for docker swarm - where we know for certain that docker is used.
As for mesos, they have their own container runtime as well. Do you know if mesos\marathon use the docker binary in their runtime, or just docker images? What other metadata can we get from the mesos tools during runtime? Perhaps we can find another way to extract the relevant pid?
Good point about different containers. Looks like Mesos is pushing people toward what they call their Universal Container Runtime which will support Docker and their original container solution (and have CLI support) but it's not released yet.
I'll see if I can get 1.9 to work locally to try it out.
@cdennison Yes,using privileged mode to get a containers pid might not be comfortable for people using the third party client.Within the existing docker setup,I don't see many alternatives as such. Some of the existing monitoring providers fallback to the same option of mounting the dockers socket(/var/run/docker.sock) into the container to access the docker host daemon.
@yuval-k I think for kubernetes squash client,you are using CRI and for which you are mounting the /var/run/cri.sock and also enabling securitycontext as privileged.How is this different to dockers privileged mode ?
I don't think it's different - if you mount the kubelet socket you don't have to install the Kubernates CLI inside your container - you can use the one on the node see here.
It's similar to the idea of running docker in docker or docker outside of docker see. In one case you have to install everything inside vs. using what's already on the node. There's pros/cons to each.
https://applatix.com/case-docker-docker-kubernetes-part/ http://container-solutions.com/running-docker-in-jenkins-in-docker/
Thanks,Will check the references. The current squash client for kubernetes platform mounts the /var/run/cri.sock inside the pod.This is not to access kubernetes cli inside the pod but more to get access to invoke the kubernetes CRI api from the pod. This is similar to how you mount the docker socket into a container for docker in docker use cases.
The CRI interface is only helping in uniform access to list pods or get container ids for a given pod.etc.But to get the actual pid,in the existing squash client for kubernetes,the hosts pid namespace is being used.(hostPID:true). Just want to point that even with CRI for kubernetes,the pod or container needs host privileges to actually get the containers pid on host.
Here’s my current analysis of how feasible it is to use squash on Mesos. TLDR; there are a lot of differences with Kubernetes that make it a much more difficult integration.
Proposed Limitations: assume no DCOS security and only allow 1 debugger to run at a time.
So even if you have HOST1:5555 and HOST2:5555 if the DNS name for both is example.marathon.mesos then calling example.marathon.mesos:5555 will randomly give you one of the other.
Solution - for initial proof of concept I would just hard code the debugger port and only allow 1 container at a time to be debugged. That port can then easily be forwarded in (no matter what host) using MesosDNS+proxy or SSH.
The Mesos/DCOS CLI cannot do an “exec” on non-UCR containers (Universal Container Runtime - there own version). https://dcos.io/docs/1.9/monitoring/debugging/task-exec/ The issue here is that UCR is still very new and I suspect most people aren’t using it yet (and it doesn't even support runtimes beyond Docker yet so it's not very "universal."
Mesos currently support running containers in privileged mode and adding arbitrary arguments such as --pid=”host” which lets you see other containers processes.
Mesos does not support this for their own container runtime (UCR) only Docker - so the squash client has to be using Docker runtime. https://jira.mesosphere.com/browse/MARATHON-7752
There is no way to use the CLI to talk to Mesos agents directly (the process running on each node) and bypass security like you can in Kubernetes by doing “mountPath: /var/run/cri.sock.” To deal with security you would have to let users pass in a security token when calling the squash server. The security is very annoying because it’s OAuth2 Auth Flow (which is hard to automate because it requires HTTP redirects) and the token expires after only a few days. In my experience most people disable the OAuth security (because they’re behind a firewall anyways) and just use Basic Auth over SSL.
Proposed Solution: for initial implementation assume no security.
https://github.com/dcos/dcos-cli/issues/588 https://medium.com/@richardgirges/authenticating-open-source-dc-os-with-third-party-services-125fa33a5add
I researched ways of getting around this - and it’s true that by default most of the authentication is off from inside the data center, but there’s no guarantees.
http://mesos.apache.org/documentation/latest/authentication/
Hi @cdennison,
Thanks so much for tackling it. sounds like you have a great plan. maybe it make sense to add @benh to the thread and ask for his help - maybe they can help us with add features that missing for seamless integration.
Cheers, Idit
Hi everyone,
I was thinking about adding Mesos/Marathon support, but there is a technical limitation on the tooling side - they don't support "docker exec" yet from their CLI - for doing things like "getting the pid."
It looks like Docker Swarm has the same issue (@crackerplace) - there are third party tools like this but nothing bundled with the official tool.
Here are a couple ideas I had for how to achieve that functionality, but none of them are ideal so I'd love your thoughts. I can also jump on slack to discuss further.
Goal is to achieve this:
docker exec -it $containerid ls -l /proc/self/ns/
This also has the benefit of making squash more generic in terms of how it handles the container-to-container interactions - in other words every platform could use the same code to get the pid and only have to integrate at the "find the container host" level.
The big downside is that a lot of people will already be anxious about the security implications of deploying a third-party tool in privileged mode and this will probably make them worse.
ssh user@mesos-cluster docker exec -it $containerid ls -l /proc/self/ns/
Again - you're just printing a helper because you wouldn't have access to their ssh keys.
You could also skip the helper and just allow an optional pid when you call
squash debug-container --pid <PID> ...
(end users for example could opt to just print their pid to their application logs and get it that way)What are your thoughts?