tinkerbell / actions

Suite of Tinkerbell Actions for use in Tinkerbell Workflows
Apache License 2.0
28 stars 43 forks source link

archive2disk does not properly untar symbolic links #72

Closed alienninja closed 2 years ago

alienninja commented 3 years ago

After archive2disk runs, looking at the root directory, symbolic links show up as zero length files. This makes the partition unusable, and the actions after archive2disk fail. I discovered this issue trying to use cexec.

Expected Behaviour

archive2disk should check the header type of each file and process the files per the respective header type.

Current Behaviour

Symlinks get created as zero length files.

Possible Solution

Review all of the header types and process according to the type, including TypeSymlink

Steps to Reproduce (for bugs)

  1. Set up a docker-compose instance of Tinkerbell
  2. Create a hardware json as outlined on https://docs.tinkerbell.org/hardware-data/
  3. Create a template yaml as outlined on the bottom of https://docs.tinkerbell.org/deploying-operating-systems/examples-ubuntu/
  4. Run a pxeboot on the machine to be provisioned, the cexec will fail, manually mount the partition in the LinuxKit env, in this case /dev/sda3, and look at the root directory and to see bin, lib, lib32, lib64, libx32 and sbin all as zero length files instead of symlinks. This is what was giving me the cexec 'exec format erro'r.

Context

I was trying to use archive2disk to untar an ubuntu_rootfs.tar.gz image. Since the tar file does not process symlinks correctly, the actions after archive2disk fail.

Manually extracting the file onto the partition works, and allows a cexec container to run. In looking at other uses of golangs tar package, files are processed based off the header.typeflag which includes TypeSymLlink


const (
    // Type '0' indicates a regular file.
    TypeReg  = '0'
    TypeRegA = '\x00' // Deprecated: Use TypeReg instead.

    // Type '1' to '6' are header-only flags and may not have a data body.
    TypeLink    = '1' // Hard link
    TypeSymlink = '2' // Symbolic link
    TypeChar    = '3' // Character device node
    TypeBlock   = '4' // Block device node
    TypeDir     = '5' // Directory
    TypeFifo    = '6' // FIFO node

    // Type '7' is reserved.
    TypeCont = '7'

    // Type 'x' is used by the PAX format to store key-value records that
    // are only relevant to the next file.
    // This package transparently handles these types.
    TypeXHeader = 'x'

    // Type 'g' is used by the PAX format to store key-value records that
    // are relevant to all subsequent files.
    // This package only supports parsing and composing such headers,
    // but does not currently support persisting the global state across files.
    TypeXGlobalHeader = 'g'

    // Type 'S' indicates a sparse file in the GNU format.
    TypeGNUSparse = 'S'

    // Types 'L' and 'K' are used by the GNU format for a meta file
    // used to store the path or link name for the next file.
    // This package transparently handles these types.
    TypeGNULongName = 'L'
    TypeGNULongLink = 'K'
)

Your Environment

#notes: using a locally modified version of rootio which can process the metadata correctly as a Tinkerbell hardware json file.
version: "0.1"
name: ubuntu_provisioning
global_timeout: 1800
tasks:
  - name: "os-installation"
    worker: "{{.device_1}}"
    volumes:
      - /dev:/dev
      - /dev/console:/dev/console
      - /lib/firmware:/lib/firmware:ro
    actions:
      - name: "disk-wipe-partition"
        image: thebsdbox/rootio:1.0
        timeout: 90
        command: ["partition"]
        environment:
            MIRROR_HOST: [redacted]
      - name: "format"
        image: thebsdbox/rootio:1.0
        timeout: 90
        command: ["format"]
        environment:
            MIRROR_HOST: [redacted]
      - name: "expand-ubuntu-filesystem-to-root"
        image: quay.io/tinkerbell-actions/archive2disk:v1.0.0
        timeout: 90
        environment:
            ARCHIVE_URL: http://[redacted]:8080/ubuntu_rootfs.tar.gz
            ARCHIVE_TYPE: targz
            DEST_DISK: /dev/sda3
            FS_TYPE: ext4
            DEST_PATH: /
      - name: "apt-update"
        image: quay.io/tinkerbell-actions/cexec:v1.0.0
        timeout: 90
        environment:
            BLOCK_DEVICE: /dev/sda3
            FS_TYPE: ext4
            DEFAULT_INTERPRETER: "/usr/bin/sh -c"
            CHROOT: y
            CMD_LINE: "apt-get -y update"
alienninja commented 2 years ago

In looking at this further, I found a more advanced example of TAR and implemented it in a fork. This is working for me, and the fork link is below. This also has a checksum capability where you can optionally specify a checksum via an environment variable. I still am in the process of testing the checksum feature. https://github.com/alienninja/hub/tree/main/actions/archive2disk/v1

The examples I used were: https://github.com/shuveb/containers-the-hard-way/blob/85013e2ebdc028a846b749cc0acc540085ff19cd/tarfile.go I didn't see a license file here.

I also referenced this example: https://github.com/hashicorp/terraform-provider-helm/blob/main/vendor/oras.land/oras-go/pkg/content/utils.go However, I was unable to get it working with the "prefix" parameter, and I wasn't sure if this was needed. It's checking to make sure that none of the TAR files target to outside of the target directory (I think), but some of the hardlinks start with "/" so it wasn't working as expected. This included the checksum code which seems like a good security feature.

My fork works as expected. Note for the end image to be bootable, the docker container image needs to include a lot more than just the default docker ubuntu:latest, I plan to post my example Dockerfile as well.

swills commented 2 years ago

Maybe libarchive via go-libarchive could be useful here?

nshalman commented 2 years ago

@alienninja I see your branch is under active development. Do you think it would be possible to open a PR and get your improvements upstreamed? Thank you for reporting this and finding the time to work on a solution!

alienninja commented 2 years ago

@nshalman Sounds good, I'll submit a pull request. Thanks