kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
426 stars 266 forks source link

why cdi-importer needs to consume so much memory? #2110

Closed iswbm closed 9 months ago

iswbm commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind enhancement

What happened:

I wanted to use cdi to create a virtual machine, and the cdi pod OOM failed,

[root@master01 ~]# kubectl get po 
NAME                               READY   STATUS      RESTARTS   AGE
importer-wbm-test--fs-top-old-vda          0/1     OOMKilled   1          76s

I had to modify the cdi configuration to make it successful.

kubectl patch cdi cdi --patch '{"spec": {"config": {"podResourceRequirements": {"limits": {"memory": "5G"}}}}}' --type merge

I used kubectl top to check the resource usage and found that cdi consumes up to 4G of memory image

What I'm wondering is why it needs to consume so much memory.

What you expected to happen:

cdi-import can work with less memory usage

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

datavolume datavolume spec:

spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      limits:
        storage: 20Gi
      requests:
        storage: 20Gi
    storageClassName: xxx
    volumeMode: Block
  source:
    http:
      url: http://xxx

Environment:

mhenriks commented 2 years ago

Hi @iswbm can you tell us a little more about the target image? What is the size and format (qcow/raw) and is it compressed?

iswbm commented 2 years ago

Hi @iswbm can you tell us a little more about the target image? What is the size and format (qcow/raw) and is it compressed?

Hi, @mhenriks

The image I use is qcow2 format, 800M, and it has been compressed.

the problem was solved when I changed the parameters of qemu-img from writeback to directsync, I guess it was the cache problem.

mhenriks commented 2 years ago

Hi @iswbm what compression was used? Gzip? xz? qcow build in compression?

kubevirt-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot commented 2 years ago

@kubevirt-bot: Closing this issue.

In response to [this](https://github.com/kubevirt/containerized-data-importer/issues/2110#issuecomment-1171773440): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
adiroiban commented 1 year ago

I got a similar issue

$ virtctl image-upload pvc iso-win-2019 --access-mode=ReadWriteOnce --size=8G --uploadproxy-url=https://palmita:32001 --force-bind --insecure --wait-secs=60 --image-path ~/Downloads/SW_DVD9_Win_Server_STD_CORE_2019_1909.4_64Bit_English_DC_STD_MLF_X22-29333.ISO 
PVC default/iso-win-2019 not found 
PersistentVolumeClaim default/iso-win-2019 created
Waiting for PVC iso-win-2019 upload pod to be ready...
Pod now ready
Uploading data to https://palmita:32001

 563.06 MiB / 5.01 GiB [==============>----------------------------------------------------------------------------------------------------------------------]  10.98% 17s

unexpected return value 502,

I was using local-storage with tmpfs, and this is why I ended up with this error.

As soon as a switched to non tmpfs storage, I was able to import it.

ydcool commented 11 months ago

same issue here. /reopen

kubevirt-bot commented 11 months ago

@ydcool: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubevirt/containerized-data-importer/issues/2110#issuecomment-1837074439): >same issue here. >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
NeverTeaser commented 10 months ago

same issue here. DataVolume define

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: "ubuntu"
spec:
  storage:
    resources:
      requests:
        storage: 5Gi
  source:
    http:
      url: "https://mirrors.tuna.tsinghua.edu.cn/ubuntu-cloud-images/bionic/20230607/bionic-server-cloudimg-amd64-root.tar.xz"

when i watch the datavolumes, the terminal info

kubectl get datavolumes  -w 

ubuntu   ImportInProgress   33.38%                28s
ubuntu   ImportInProgress   38.01%                30s
ubuntu   ImportInProgress   42.42%                32s
ubuntu   ImportInProgress   46.55%                34s
ubuntu   ImportInProgress   50.82%                36s
ubuntu   ImportInProgress   55.40%                38s
ubuntu   ImportInProgress   60.19%                40s
ubuntu   ImportInProgress   64.79%                42s
ubuntu   ImportInProgress   64.79%                43s
ubuntu   ImportInProgress   64.79%     1          44s
ubuntu   ImportInProgress   4.44%      1          46s
ubuntu   ImportInProgress   8.33%      1          48s
ubuntu   ImportInProgress   11.34%     1          50s
ubuntu   ImportInProgress   15.55%     1          52s

always restart to import the image

describe import-xxx pod info

 importer:
    Container ID:    docker://79e6dc41b90bd86e59f3f481219eae8b53885146cb630b586e35ca0d805145bb
    Image:           quay.io/kubevirt/cdi-importer:v1.58.0
    Image ID:        docker-pullable://quay.io/kubevirt/cdi-importer@sha256:72ee3dd7073a8c25c016f29d2075afe220a607e6081c50f48b0b0a16ddb516f6
    Port:            8443/TCP
    Host Port:       0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      -v=1
    State:          Running
      Started:      Fri, 29 Dec 2023 15:53:04 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 29 Dec 2023 15:52:19 +0800
      Finished:     Fri, 29 Dec 2023 15:52:50 +0800
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     750m
      memory:  600M
    Requests:
      cpu:     100m
      memory:  60M

And i patch the cdi podResourceRequirements then import success

kubectl patch cdi cdi --patch '{"spec": {"config": {"podResourceRequirements": {"limits": {"memory": "1G"}}}}}' --type merge

mhenriks commented 10 months ago

/reopen /remove-lifecycle rotten

Thanks for adding a solution @NeverTeaser

@adiroiban image-upload is a totally different flow from import. Can you open a sepparate issue with logs from "cdi-uploadproxy" and "cdi-upload-" pod

@akalenyu any thoughts on this? I think you've looked into mem consumption before

kubevirt-bot commented 10 months ago

@mhenriks: Reopened this issue.

In response to [this](https://github.com/kubevirt/containerized-data-importer/issues/2110#issuecomment-1896653985): >/reopen >/remove-lifecycle rotten > >Thanks for adding a solution @NeverTeaser > >@adiroiban image-upload is a totally different flow from import. Can you open a sepparate issue with logs from "cdi-uploadproxy" and "cdi-upload-" pod > >@akalenyu any thoughts on this? I think you've looked into mem consumption before Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
akalenyu commented 9 months ago

any thoughts on this? I think you've looked into mem consumption before

Sure. CDI writes to target using cache=writeback (utilizes page cache) so this is likely a brief (or not) period of time where a significant chunk of your image sits essentially in memory before being flushed to disk, triggering OOM. There have been several bugs about that, but specifically when the nodes are on cgroupsv1. cgroupsv2 behaves differently. Are your nodes using cgroupsv1?

For a detailed explanation and ways to mitigate on cgroupsv1, see https://bugzilla.redhat.com/show_bug.cgi?id=2196072#c14

adiroiban commented 9 months ago

My report is not a bug. It's a misconfiguration on my side. The default pod created by kubevirt has a memory limit of memory: 600M

And I was using tmpfs storage ... so that tmpfs storage counts towards memory usage.

As soon as I have replaced by volume with a openEBS it worked.

alromeros commented 9 months ago

I'm closing this since the issue has been addressed, feel free to reopen if necessary. Thanks!