Confusing and possibly inaccurate docs about node-level ephemeral storage

nnlkcncff commented 1 year ago

Pods use ephemeral local storage for scratch space, caching, and for logs. The kubelet can provide scratch space to Pods using local ephemeral storage to mount emptyDir volumes into containers. The kubelet also uses this kind of storage to hold node-level container logs, container images, and the writable layers of running containers. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage

The information about container images and writable layers doesn't seem quite correct.

Storage of container images is the responsibility of CRI, not kubelet. For new empty cluster it is easy to check by comparing CRI and kubelet catalog sizes, e.g. (with containerd as CRI):

du \
    --max-depth 0 \
    --human-readable \
    /var/lib/containerd \
    /var/lib/kubelet

You will see that the CRI directory will be larger than the kubelet directory, mainly because of the container images.

Kubelet places only some files in some directories, especially in the /var/lib/kubelet/pods directory, and some of them are then mounted by CRI (including ephemeral storage volumes, backed by emptyDir).

But there is only one writable layer — the upper one, and it seems to be the responsibility of the storage driver (OverlayFS2 by default), so this layer located in CRI directory tree, for Kubernetes with containerd as CRI it's /run/containerd/io.containerd.runtime.v2.task/k8s.io/.../rootfs. For example:

# Find some container ID:
crictl ps

# Find the upper layer ID:
crictl inspect \
    --output go-template \
    --template '{{ .info.snapshotKey }}' \
    "${CONTAINER_ID}"

# Find the upper layer directory:
mount | grep "${UPPER_LAYER_ID}"

Isn't that right?

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

dipesh-rawat commented 1 year ago

Page related to issue: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ /language en

nnlkcncff commented 1 year ago

Sorry, I think I've misunderstood about writable layers. I think the documentation implies a single writable layer, but for each container. Because of this, "writable layers" is plural. But the directory for these layers is still in question.

sftim commented 1 year ago

There are running containers, which the container runtime cares about, and then there are container images.

Plausibly there's a CRI implementation where the kubelet and CRI listening socket are on server B, and then each Pod is on a separate hardware server (C,D,E,F) with a special OS, and the container image data is copied onto those other servers during Pod launch.

Container images have layers; they just do. Running containers might use OverlayFS, but they could also use another technology so long as the app in the container sees the same files and directories.

We can document that better, but IMO https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ isn't where we need to make the improvement.

sftim commented 1 year ago

Storage of container images is the responsibility of CRI, not kubelet.

CRI is an interface between the kubelet and the container runtime. It's defined by the Kubernetes project. The kubelet tells the container runtime where to store the pulled image data (and the runtime does the actual fetch).

nnlkcncff commented 1 year ago

Earlier at the all points I wrote CRI, but I meant container runtime (CR), sorry for confusing.

I understand there could be different implementation, e.g.:

a container runtime and kubelet on the same host (IPC via socket or localhost)
a container runtime and kubelet on different hosts (IPC via socket)
or something like this https://katacontainers.io/
- …

Also I understand that OverlayFS is not the only option for storage driver.

I also realize that I don't understand enough :)

In the case of containerd, I did some research based on running container configurations and image manifests and saw the following:

all the kubernetes containers are created under k8s.io CR namespace, their root filesystems are placed under /run/containerd/io.containerd.runtime.v2.task/k8s.io/.../rootfs
all containers created by crictl with defaults are created under default CR namespace, /run/containerd/io.containerd.runtime.v2.task/default/.../rootfs
all image layers from both namespaces are placed under /var/lib/containerd/io.containerd.content.v1.content/blobs

CR seems to be the one that controls image placement and layers. Most data under the /var/lib/kubelet/pods/ looks like volumes (which could writable or not), there are no writable layers or images.

Because all of that, the phrase The kubelet also uses this kind of storage to hold node-level container logs, container images, and the writable layers of running containers. doesn't look quite accurate to me.

You can close this issue at any time if you think it's the right thing to do, because this time I opened an issue without being completely sure about it. Maybe I should have discussed this here first.

sftim commented 1 year ago

/sig node Let's triage this.

sftim commented 1 year ago

(BTW, and only as far as I know: if you don't configure imagefs, the kubelet allows the container runtime to put the image wherever it likes)

/var/lib/containerd/io.containerd.content.v1.content/blobs is part of what Kubernetes considers:

node level
ephemeral

The node level bit is obvious, but maybe the “ephemeral” isn't clear enough. A related problem is that the whole concept is hard to explain and is much more of a node concept than a storage concept or a resource management thing.

I'd welcome a new page covering kubelet and what goes in local node filesystems; the new page could be part of https://kubernetes.io/docs/reference/node/

/retitle Confusing and possibly inaccurate docs about node-level ephemeral storage

nnlkcncff commented 1 year ago

According to what I understand I think that /var/lib/containerd/io.containerd.content.v1.content/blobs shouldn't be considered as a node level (if you don't mount this directory directly via hostPath) nor an ephemeral, because for Linux container layer blobs are just tar.gzip archives, and no blobs are directly mounted.

During pull operation all kinds of blobs (index, manifest, config and layers) are downloaded to the blobs directory and every layer blob is extracted to snapshots directory.

While container starts:

the read-only layer is compiled from all image layers under the rootfs directory (for OverlayFS this is the lowerdir)
the writable layer is created over rootfs directory under the snapthots directory (for OverlayFS this is usually the workdir and the upperdir)

All volume mount operations happen on the writable layer or somewhere at sandbox level (due to volume propagation policies, tmpfs mount specifics and maybe something else).

Directory paths in the case of containerd, OverlayFS, and Kubernetes:

blobs: /var/lib/containerd/io.containerd.content.v1.content/blobs/...
snapshots: /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/.../...
rootfs: /run/containerd/io.containerd.runtime.v2.task/k8s.io/.../rootfs

It's the way I see it.

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sftim commented 6 months ago

/sig storage /remove-lifecycle rotten

We should triage this.

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 month ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/website/issues/42260#issuecomment-2248914876): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / website

Confusing and possibly inaccurate docs about node-level ephemeral storage #42260