kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
425 stars 264 forks source link

tar extraction fails when tarfile has relative links #2982

Open dropte opened 11 months ago

dropte commented 11 months ago

What happened: When extracting a Tarball with top level relative links, extraction will fail. Chaging the tarball to contain only absolute links makes it succeed.

What you expected to happen: DataVolume should be created and return success

How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior. Create a tarball from a directory:

mkdir example
touch example/example
tar -cf example.tar -C example .
cd example
tar -cf ../example_norel.tar  *
cd ..

Host the files via http. Create DataVolumes:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: import-archive-datavolume-rel
spec:
  source:
      http:
         url: "https://webhost/example.tar" 
  contentType: archive
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Mi
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: import-archive-datavolume-norel
spec:
  source:
      http:
         url: "https://webhost/example_norel.tar" 
  contentType: archive
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Mi

output from importer for rel:

I1113 14:07:31.425317       1 importer.go:103] Starting importer
I1113 14:07:31.425360       1 importer.go:172] begin import process
I1113 14:07:31.895947       1 data-processor.go:356] Calculating available size
I1113 14:07:31.895972       1 data-processor.go:368] Checking out file system volume size.
I1113 14:07:31.895987       1 data-processor.go:376] Request image size not empty.
I1113 14:07:31.895998       1 data-processor.go:381] Target size 96112640.
I1113 14:07:31.896031       1 data-processor.go:255] New phase: TransferDataDir
I1113 14:07:31.896045       1 util.go:207] begin untar to /data...
I1113 14:07:31.896050       1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E1113 14:07:31.897677       1 util.go:222] exit status 2
E1113 14:07:31.897695       1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
    pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
E1113 14:07:31.897773       1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
    pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
kubectl get DataVolume

NAME                              PHASE              PROGRESS   RESTARTS   AGE
import-archive-datavolume-norel   Succeeded          100.0%                3m33s
import-archive-datavolume-rel     ImportInProgress   N/A        5          3m25s
kubectl describe DataVolume
Name:         import-archive-datavolume-norel
Namespace:    default
Labels:       <none>
Annotations:  cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2023-11-13T14:05:41Z
  Generation:          1
  Managed Fields:
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cdi.kubevirt.io/storage.usePopulator:
    Manager:      cdi-controller
    Operation:    Update
    Time:         2023-11-13T14:05:41Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:contentType:
        f:pvc:
          .:
          f:accessModes:
          f:resources:
            .:
            f:requests:
              .:
              f:storage:
        f:source:
          .:
          f:http:
            .:
            f:url:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-11-13T14:05:41Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:claimName:
        f:conditions:
        f:phase:
        f:progress:
    Manager:         cdi-controller
    Operation:       Update
    Subresource:     status
    Time:            2023-11-13T14:06:26Z
  Resource Version:  8338299
  UID:               46d5dea5-8a3e-425b-af40-8150244423da
Spec:
  Content Type:  archive
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:  100Mi
  Source:
    Http:
      URL:  https://webhost/example_norel.tar
Status:
  Claim Name:  import-archive-datavolume-norel
  Conditions:
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Message:               PVC import-archive-datavolume-norel Bound
    Reason:                Bound
    Status:                True
    Type:                  Bound
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Status:                True
    Type:                  Ready
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Message:               Import Complete
    Reason:                Completed
    Status:                False
    Type:                  Running
  Phase:                   Succeeded
  Progress:                100.0%
Events:
  Type    Reason            Age    From                          Message
  ----    ------            ----   ----                          -------
  Normal  Pending           4m8s   datavolume-import-controller  PVC import-archive-datavolume-norel Pending
  Normal  ImportInProgress  3m23s  datavolume-import-controller  Import into import-archive-datavolume-norel in progress
  Normal  ImportSucceeded   3m23s  datavolume-import-controller  Successfully imported into PVC import-archive-datavolume-norel
  Normal  Bound             3m23s  datavolume-import-controller  PVC import-archive-datavolume-norel Bound

Name:         import-archive-datavolume-rel
Namespace:    default
Labels:       <none>
Annotations:  cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2023-11-13T14:05:49Z
  Generation:          1
  Managed Fields:
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cdi.kubevirt.io/storage.usePopulator:
    Manager:      cdi-controller
    Operation:    Update
    Time:         2023-11-13T14:05:49Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:contentType:
        f:pvc:
          .:
          f:accessModes:
          f:resources:
            .:
            f:requests:
              .:
              f:storage:
        f:source:
          .:
          f:http:
            .:
            f:url:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-11-13T14:05:49Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:claimName:
        f:conditions:
        f:phase:
        f:progress:
        f:restartCount:
    Manager:         cdi-controller
    Operation:       Update
    Subresource:     status
    Time:            2023-11-13T14:08:58Z
  Resource Version:  8351981
  UID:               1c970368-e14b-4132-ba77-31b3a3ee70f6
Spec:
  Content Type:  archive
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:  100Mi
  Source:
    Http:
      URL:  https://webhost/example.tar
Status:
  Claim Name:  import-archive-datavolume-rel
  Conditions:
    Last Heartbeat Time:   2023-11-13T14:05:49Z
    Last Transition Time:  2023-11-13T14:05:49Z
    Message:               PVC import-archive-datavolume-rel Pending
    Reason:                Pending
    Status:                False
    Type:                  Bound
    Last Heartbeat Time:   2023-11-13T14:08:58Z
    Last Transition Time:  2023-11-13T14:05:49Z
    Status:                False
    Type:                  Ready
    Last Heartbeat Time:   2023-11-13T14:08:58Z
    Last Transition Time:  2023-11-13T14:08:58Z
    Message:               Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2
    Reason:                Error
    Status:                False
    Type:                  Running
  Phase:                   ImportInProgress
  Progress:                N/A
  Restart Count:           5
Events:
  Type     Reason            Age                  From                          Message
  ----     ------            ----                 ----                          -------
  Normal   Pending           4m1s                 datavolume-import-controller  PVC import-archive-datavolume-rel Pending
  Normal   ImportInProgress  3m24s                datavolume-import-controller  Import into import-archive-datavolume-rel in progress
  Warning  Error             52s (x5 over 3m24s)  datavolume-import-controller  Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2

Additional context: Using Rook-Ceph as the storage provider, but can reproduce running importer container on local storage via docker. Seems to have worked ok in 1.52.0. Possibly related to permissions running as non-root user in container.

Environment:

akalenyu commented 11 months ago

Hey, thanks for reporting this!

I think if you increase the CDI log verbosity with something like

kubectl set env deployment cdi-operator \
        --namespace="${cdi_namespace}" \
        --containers='cdi-operator' \
        VERBOSITY="3"

We should get the actual stdout/stderr of the untar command

dropte commented 11 months ago

Lightly redacted output:

I1114 14:05:16.309950       1 importer.go:103] Starting importer
I1114 14:05:16.309997       1 importer.go:172] begin import process
I1114 14:05:16.310038       1 http-datasource.go:392] Attempting to HEAD "https://<url>/example.tar" via http client
I1114 14:05:16.598432       1 http-datasource.go:424] Content length: 2048
I1114 14:05:16.598446       1 http-datasource.go:327] Attempting to get object "https://<url>/example.tar" via http client
I1114 14:05:16.671028       1 data-processor.go:356] Calculating available size
I1114 14:05:16.671075       1 data-processor.go:368] Checking out file system volume size.
I1114 14:05:16.671103       1 data-processor.go:376] Request image size not empty.
I1114 14:05:16.671114       1 data-processor.go:381] Target size 96112640.
I1114 14:05:16.671149       1 format-readers.go:112] constructReaders: checking compression and archive formats
I1114 14:05:16.671163       1 format-readers.go:121] found header of type "tar"
I1114 14:05:16.671171       1 data-processor.go:255] New phase: TransferDataDir
I1114 14:05:16.671180       1 util.go:207] begin untar to /data...
I1114 14:05:16.671187       1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
I1114 14:05:16.672780       1 util.go:220] STDOUT
./
./example

I1114 14:05:16.672787       1 util.go:221] STDERR
/usr/bin/tar: .: Cannot utime: Operation not permitted
/usr/bin/tar: .: Cannot change mode to rwxr-xr-x: Operation not permitted
/usr/bin/tar: Exiting with failure status due to previous errors

E1114 14:05:16.672793       1 util.go:222] exit status 2
E1114 14:05:16.672806       1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
    pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
E1114 14:05:16.672884       1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
    pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
    pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
    pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
    pkg/importer/data-processor.go:157
main.handleImport
    cmd/cdi-importer/importer.go:178
main.main
    cmd/cdi-importer/importer.go:144
runtime.main
    GOROOT/src/runtime/proc.go:250
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1594

Note that this is on nodes with the device_ownership_from_security_context set to true at the containerd level.

akalenyu commented 11 months ago

I see. Maybe as non-root it would make sense for us to use these

-m, --touch
    Don't extract file modified time.

--no-overwrite-dir
    Preserve metadata of existing directories.
akalenyu commented 11 months ago

/assign akalenyu

aglitke commented 10 months ago

Is this still an issue for you?

kubevirt-bot commented 7 months ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

akalenyu commented 7 months ago

/remove-lifecycle stale

ianb-mp commented 6 months ago

I've encountered the same issue.

Content of TAR that causes the error:

$ tar -tv --numeric-owner -f archive.tar 
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./blah/
-rw-r--r-- 2009/2000        12 2024-04-05 05:33 ./blah/README
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./foo/

Content of the TAR that doesn't cause an error:

$ tar -tv --numeric-owner -f archive2.tar 
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 blah/
-rw-r--r-- 2009/2000        12 2024-04-05 05:33 blah/README
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 foo/
kubevirt-bot commented 3 months ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

akalenyu commented 3 months ago

/remove-lifecycle stale issue definitely still around

tux-o-matic commented 3 weeks ago

Issue isn't limited to archives containing links. The parameters used by CDI when calling tar won't work with all PVC/StorageClass and Pod security context due to ownership. It looks like the current importer Pod is tailored for importing disk images meant to be consumed by libvirt/qemu. But maybe that's a limiting factor when just trying to import an archive with random files to be mounted with VirtIO disk and not as a VM disk.