Open rnestler opened 5 years ago
There is currently no support for kaniko.
I don't have any experience with kaniko so it is hard to judge how much work it would be. I think it could be feasible but step one would be to look at the current code and how to introduce a level of abstraction so that dockerd and kaniko can be swapped out for building images.
For those that are interested, I've created a bootstrapped workaround for BinderHub to run an unprivileged Kaniko builder for repo2docker
. Since this skips repo2docker, The Binderhub will only work with Github or Gitlab repositories that have a Dockerfile. It uses DockerHub to push, and requires a push secret in the namespace called dockerhub-secret
that was configured with these instructions from gcr.
Added to the config.yaml
for BinderHub:
config:
BinderHub:
push_secret: dockerhub-secret
extraConfig:
zz-swap-kaniko-for-docker: |
from binderhub.build import Build, ProgressEvent
from binderhub.utils import KUBE_REQUEST_TIMEOUT
from kubernetes import client, watch
from tornado.log import app_log
class KanikoBuilder(Build):
def get_cmd(self):
"""Get the cmd to run to build the image"""
cmd = self.get_r2d_cmd_options()
# repo_url comes at the end, since otherwise our arguments
# might be mistook for commands to run.
# see https://github.com/jupyter/repo2docker/pull/128
# cmd.append(self.repo_url)
print('repo building command args are: %s' % ' '.join(cmd), flush = True)
return cmd
def get_r2d_cmd_options(self):
if "gitlab" in self.repo_url:
dockerfile_url = self.repo_url + '/-/raw/main/Dockerfile'
elif "github" in self.repo_url:
dockerfile_url = "https://raw.githubusercontent.com" + url.split('github.com')[-1] + '/master/Dockerfile'
else:
raise NotImplementedError("Only Gitlab.com, custom Gitlabs, or Github repositories with Dockerfiles in the repository's root directory are implemented")
r2d_options = [
"--use-new-run",
"--snapshotMode=redo",
"--dockerfile",
dockerfile_url,
"--destination",
self.image_name
]
return r2d_options
def submit(self):
"""
Submit a build pod to create the image for the repository.
Progress of the build can be monitored by listening for items in
the Queue passed to the constructor as `q`.
"""
self.name = 'kaniko-' + self.name[7:]
self.build_image = 'gcr.io/kaniko-project/executor:debug'
volume_mounts = []
volumes = []
if True: #self.push_secret:
volume_mounts.append(
client.V1VolumeMount(mount_path="/kaniko/.docker", name="dockerhub-config")
)
volumes.append(
client.V1Volume(
name="dockerhub-config",
secret=client.V1SecretVolumeSource(secret_name=self.push_secret),
)
)
env = []
if self.git_credentials:
env.append(
client.V1EnvVar(name="GIT_CREDENTIAL_ENV", value=self.git_credentials)
)
self.pod = client.V1Pod(
metadata=client.V1ObjectMeta(
name=self.name,
labels={
"name": self.name,
"component": self._component_label,
},
annotations={
"binder-repo": self.repo_url,
},
),
spec=client.V1PodSpec(
containers=[
client.V1Container(
image=self.build_image,
name="builder",
args=self.get_cmd(),
volume_mounts=volume_mounts,
resources=client.V1ResourceRequirements(
limits={"memory": self.memory_limit},
requests={"memory": self.memory_request},
),
env=env,
)
],
tolerations=[
client.V1Toleration(
key="hub.jupyter.org/dedicated",
operator="Equal",
value="user",
effect="NoSchedule",
),
# GKE currently does not permit creating taints on a node pool
# with a `/` in the key field
client.V1Toleration(
key="hub.jupyter.org_dedicated",
operator="Equal",
value="user",
effect="NoSchedule",
),
],
node_selector=self.node_selector,
volumes=volumes,
restart_policy="Never",
affinity=self.get_affinity(),
),
)
try:
_ = self.api.create_namespaced_pod(
self.namespace,
self.pod,
_request_timeout=KUBE_REQUEST_TIMEOUT,
)
except client.rest.ApiException as e:
if e.status == 409:
# Someone else created it!
app_log.info("Build %s already running", self.name)
pass
else:
raise
else:
app_log.info("Started build %s", self.name)
app_log.info("Watching build pod %s", self.name)
while not self.stop_event.is_set():
w = watch.Watch()
try:
for f in w.stream(
self.api.list_namespaced_pod,
self.namespace,
label_selector=f"name={self.name}",
timeout_seconds=30,
_request_timeout=KUBE_REQUEST_TIMEOUT,
):
try:
print(self.api.read_namespaced_pod_log(name=self.name, namespace=self.namespace),flush=True)
except:
pass
if f["type"] == "DELETED":
# Assume this is a successful completion
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.COMPLETED,
)
return
self.pod = f["object"]
if not self.stop_event.is_set():
# Account for all the phases kubernetes pods can be in
# Pending, Running, Succeeded, Failed, Unknown
# https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
phase = self.pod.status.phase
if phase == "Pending":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.PENDING,
)
elif phase == "Running":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.RUNNING,
)
elif phase == "Succeeded":
# Do nothing! We will clean this up, and send a 'Completed' progress event
# when the pod has been deleted
pass
elif phase == "Failed":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.FAILED,
)
elif phase == "Unknown":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.UNKNOWN,
)
else:
# This shouldn't happen, unless k8s introduces new Phase types
warnings.warn(
f"Found unknown phase {phase} when building {self.name}"
)
if self.pod.status.phase == "Succeeded":
self.cleanup()
elif self.pod.status.phase == "Failed":
self.cleanup()
except Exception:
app_log.exception("Error in watch stream for %s", self.name)
raise
finally:
w.stop()
if self.stop_event.is_set():
app_log.info("Stopping watch of %s", self.name)
return
if hasattr(c, 'BinderHub'):
c.BinderHub.build_class = KanikoBuilder
else:
raise NameError("Kaniko build class cannot find Binderhub configuration")
I'd like to raise more attention to this issue due to a few developments in the past couple years. Docker has become deprecated in K8S, and using containerd/DIND and exposing the docker socket is a security vulnerability. Unprivileged Kaniko containers are one of the only ways to securely build on K8S: https://kurtmadel.com/posts/native-kubernetes-continuous-delivery/building-container-images-with-kubernetes/
It looks like this was attempted a few years ago: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1225 but there was concerns it would be too slow. Kaniko run flags (--use-new-run
and --snapshotMode=redo
) and replacing conda
with mamba
could get Kaniko to build just as fast as Docker, plus BinderHub has configurations to make labs launch faster (prepuller, sticky_builds) such that time-to-build isn't a bottleneck.
Hoping to see Kaniko integration soon!
repo2docker added support for alternative container engines last year https://github.com/jupyterhub/repo2docker/pull/848
For example, I've written a daemonless root-less podman backend https://github.com/manics/repo2podman (note if you're happy to run a rootless daemon you should be able to use rootless podman with it's docker compatible socket instead).
BinderHub also gained support for pluggable backends, for example you can now run it with Docker or Podman without a Kubernetes cluster: https://github.com/jupyterhub/binderhub/tree/master/testing/local-binder-local-hub
The registry component of BinderHub can also be overidden so in principle if your builder and spawner wanted to use some other type of arbitrary storage instead of a container registry that would also be possible.
This should mean the basic framework is in place to support other container builders and run-times. To start with I think someone needs to implement Kaniko as an alternative builder for repo2docker.
I am really interested in running the BinderHub with your Kaniko builder workaround. However, I can't find any specific steps to follow. What I've done so far is:
Downloaded (wget) and configured BinderHub according to Zero-to-BinderHub with the following configuration file: config.yaml
config:
BinderHub:
hub_url: http://10.16.63.179
use_registry: true
image_prefix: spectraes/binder-dev-
push_secret: dockerhub-secret
extraConfig:
zz-swap-kaniko-for-docker: |
from binderhub.build import Build, ProgressEvent
from binderhub.utils import KUBE_REQUEST_TIMEOUT
from kubernetes import client, watch
from tornado.log import app_log
class KanikoBuilder(Build):
def get_cmd(self):
"""Get the cmd to run to build the image"""
cmd = self.get_r2d_cmd_options()
# repo_url comes at the end, since otherwise our arguments
# might be mistook for commands to run.
# see https://github.com/jupyter/repo2docker/pull/128
# cmd.append(self.repo_url)
print('repo building command args are: %s' % ' '.join(cmd), flush = True)
return cmd
def get_r2d_cmd_options(self):
if "gitlab" in self.repo_url:
dockerfile_url = self.repo_url + '/-/raw/main/Dockerfile'
elif "github" in self.repo_url:
dockerfile_url = "https://raw.githubusercontent.com" + url.split('github.com')[-1] + '/master/Dockerfile'
else:
raise NotImplementedError("Only Gitlab.com, custom Gitlabs, or Github repositories with Dockerfiles in the repository's root directory are implemented")
r2d_options = [
"--use-new-run",
"--snapshotMode=redo",
"--dockerfile",
dockerfile_url,
"--destination",
self.image_name
]
return r2d_options
def submit(self):
"""
Submit a build pod to create the image for the repository.
Progress of the build can be monitored by listening for items in
the Queue passed to the constructor as `q`.
"""
self.name = 'kaniko-' + self.name[7:]
self.build_image = 'gcr.io/kaniko-project/executor:debug'
volume_mounts = []
volumes = []
if True: #self.push_secret:
volume_mounts.append(
client.V1VolumeMount(mount_path="/kaniko/.docker", name="dockerhub-config")
)
volumes.append(
client.V1Volume(
name="dockerhub-config",
secret=client.V1SecretVolumeSource(secret_name=self.push_secret),
)
)
env = []
if self.git_credentials:
env.append(
client.V1EnvVar(name="GIT_CREDENTIAL_ENV", value=self.git_credentials)
)
self.pod = client.V1Pod(
metadata=client.V1ObjectMeta(
name=self.name,
labels={
"name": self.name,
"component": self._component_label,
},
annotations={
"binder-repo": self.repo_url,
},
),
spec=client.V1PodSpec(
containers=[
client.V1Container(
image=self.build_image,
name="builder",
args=self.get_cmd(),
volume_mounts=volume_mounts,
resources=client.V1ResourceRequirements(
limits={"memory": self.memory_limit},
requests={"memory": self.memory_request},
),
env=env,
)
],
tolerations=[
client.V1Toleration(
key="hub.jupyter.org/dedicated",
operator="Equal",
value="user",
effect="NoSchedule",
),
# GKE currently does not permit creating taints on a node pool
# with a `/` in the key field
client.V1Toleration(
key="hub.jupyter.org_dedicated",
operator="Equal",
value="user",
effect="NoSchedule",
),
],
node_selector=self.node_selector,
volumes=volumes,
restart_policy="Never",
affinity=self.get_affinity(),
),
)
try:
_ = self.api.create_namespaced_pod(
self.namespace,
self.pod,
_request_timeout=KUBE_REQUEST_TIMEOUT,
)
except client.rest.ApiException as e:
if e.status == 409:
# Someone else created it!
app_log.info("Build %s already running", self.name)
pass
else:
raise
else:
app_log.info("Started build %s", self.name)
app_log.info("Watching build pod %s", self.name)
while not self.stop_event.is_set():
w = watch.Watch()
try:
for f in w.stream(
self.api.list_namespaced_pod,
self.namespace,
label_selector=f"name={self.name}",
timeout_seconds=30,
_request_timeout=KUBE_REQUEST_TIMEOUT,
):
try:
print(self.api.read_namespaced_pod_log(name=self.name, namespace=self.namespace),flush=True)
except:
pass
if f["type"] == "DELETED":
# Assume this is a successful completion
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.COMPLETED,
)
return
self.pod = f["object"]
if not self.stop_event.is_set():
# Account for all the phases kubernetes pods can be in
# Pending, Running, Succeeded, Failed, Unknown
# https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
phase = self.pod.status.phase
if phase == "Pending":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.PENDING,
)
elif phase == "Running":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.RUNNING,
)
elif phase == "Succeeded":
# Do nothing! We will clean this up, and send a 'Completed' progress event
# when the pod has been deleted
pass
elif phase == "Failed":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.FAILED,
)
elif phase == "Unknown":
self.progress(
ProgressEvent.Kind.BUILD_STATUS_CHANGE,
ProgressEvent.BuildStatus.UNKNOWN,
)
else:
# This shouldn't happen, unless k8s introduces new Phase types
warnings.warn(
f"Found unknown phase {phase} when building {self.name}"
)
if self.pod.status.phase == "Succeeded":
self.cleanup()
elif self.pod.status.phase == "Failed":
self.cleanup()
except Exception:
app_log.exception("Error in watch stream for %s", self.name)
raise
finally:
w.stop()
if self.stop_event.is_set():
app_log.info("Stopping watch of %s", self.name)
return
if hasattr(c, 'BinderHub'):
c.BinderHub.build_class = KanikoBuilder
else:
raise NameError("Kaniko build class cannot find Binderhub configuration")
But the build fails due to failed volume mount of a build pod
Warning FailedMount 13s (x7 over 45s) kubelet MountVolume.SetUp failed for volume "docker-socket" : hostPath type check failed: /var/run/docker.sock is not a socket file
Which, I assume, is expected because when using Kaniko for building, docker socket is not to be used. I think I am missing an option where to select your Kaniko workaround as a used builder but I can't find a place where to select it. Maybe option in values.yaml file:
imageBuilderType: "host"
?
For those who would like to continue and reproduce the workaround from @MatthewBM, having the same issue as me, I found a solution. KanikoBuilder class inherits from Build, which (I assume) is no longer supported. Instead of build, one needs to import KubernetesBuildExecutor. There are also few compatibility issues after that but they can be quickly resolved by looking at binder pod logs, requiring small changes in the workaround script.
...
zz-swap-kaniko-for-docker: |
from binderhub.build import KubernetesBuildExecutor, ProgressEvent
from binderhub.utils import KUBE_REQUEST_TIMEOUT
from kubernetes import client, watch
from tornado.log import app_log
class KanikoBuilder(KubernetesBuildExecutor):
...
I've written a repo2docker extension to use Kaniko: https://github.com/manics/repo2kaniko/
If you use the latest BinderHub that includes https://github.com/jupyterhub/binderhub/pull/1766 and https://github.com/jupyterhub/binderhub/pull/1795 this config should work:
config:
KubernetesBuildExecutor:
docker_host:
build_image: "quay.io/manics/repo2kaniko:0.1.0"
repo2docker_extra_args:
- --engine=kaniko
- --debug
imageCleaner:
enabled: false
Unfortunately Kaniko doesn't build all repositories, so far I've noticed problems with https://github.com/manics/jupyter-remote-desktop-proxy/ and some RStudio repos.
I'd like to build container images during CI inside an unprivileged Docker container where DIND is not available.
Is it possible to use kaniko instead of Docker directly to build the container image with repo2docker?
If not, would it be feasible to add support for building with kaniko in repo2docker?