kubernetes-csi / csi-driver-host-path

A sample (non-production) CSI Driver that creates a local directory as a volume on a single node
Apache License 2.0
324 stars 211 forks source link

clone volume: cp doesn't support sparse file #279

Closed stoneshi-yunify closed 3 years ago

stoneshi-yunify commented 3 years ago

Hostpath: v1.6.2

Cloning volume will call cp -a <src-vol> <dest_vol>, refer to https://github.com/kubernetes-csi/csi-driver-host-path/blob/e4d72e308439144cdd996cbb7e22f8ca0a965474/pkg/hostpath/hostpath.go#L480.

The cp from Alpine by default doesn't support sparse file, it will copy a sparse file as a regular file. Therefore, if the source volume has a large sparse file, the cp will be extremely slow.

A QEMU/VM disk image is a kind of sparse file we usually see, projects like kubevirt them a lot.

The cp from coreutils supports sparse file by default, and will extremely shorten the copying time. So hostpath may just install the coreutils.

The cp test:

root@kubevm:~# kubectl -n kube-system exec -it csi-hostpathplugin-0 -c hostpath -- sh
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # ls -l
total 17056
-rw-rw---- 1 root root 20293720064 Apr 23 05:46 disk.img
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # du -sh *
17M disk.img
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # time cp -a /csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 /csi-data-dir/old-cp
real    2m 35.27s
user    0m 0.02s
sys 1m 58.11s
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # cp --help
BusyBox v1.32.1 () multi-call binary.

Usage: cp [OPTIONS] SOURCE... DEST

Copy SOURCE(s) to DEST

    -a  Same as -dpR
    -R,-r   Recurse
    -d,-P   Preserve symlinks (default if -R)
    -L  Follow all symlinks
    -H  Follow symlinks on command line
    -p  Preserve file attributes if possible
    -f  Overwrite
    -i  Prompt before overwrite
    -l,-s   Create (sym)links
    -T  Treat DEST as a normal file
    -u  Copy only newer files
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # apk add coreutils
fetch https://mirrors.aliyun.com/alpine/v3.13/main/x86_64/APKINDEX.tar.gz
fetch https://mirrors.aliyun.com/alpine/v3.13/community/x86_64/APKINDEX.tar.gz
(1/6) Installing libacl (2.2.53-r0)
(2/6) Installing libattr (2.4.48-r0)
(3/6) Installing skalibs (2.10.0.0-r0)
(4/6) Installing s6-ipcserver (2.10.0.0-r0)
(5/6) Installing utmps (0.1.0.0-r0)
Executing utmps-0.1.0.0-r0.pre-install
(6/6) Installing coreutils (8.32-r2)
Executing busybox-1.32.1-r6.trigger
OK: 14 MiB in 39 packages
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 #
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 #
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
  or:  cp [OPTION]... SOURCE... DIRECTORY
  or:  cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

Mandatory arguments to long options are mandatory for short options too.
  -a, --archive                same as -dR --preserve=all
      --attributes-only        don't copy the file data, just the attributes
      --backup[=CONTROL]       make a backup of each existing destination file
  -b                           like --backup but does not accept an argument
      --copy-contents          copy contents of special files when recursive
  -d                           same as --no-dereference --preserve=links
  -f, --force                  if an existing destination file cannot be
                                 opened, remove it and try again (this option
                                 is ignored when the -n option is also used)
  -i, --interactive            prompt before overwrite (overrides a previous -n
                                  option)
  -H                           follow command-line symbolic links in SOURCE
  -l, --link                   hard link files instead of copying
  -L, --dereference            always follow symbolic links in SOURCE
  -n, --no-clobber             do not overwrite an existing file (overrides
                                 a previous -i option)
  -P, --no-dereference         never follow symbolic links in SOURCE
  -p                           same as --preserve=mode,ownership,timestamps
      --preserve[=ATTR_LIST]   preserve the specified attributes (default:
                                 mode,ownership,timestamps), if possible
                                 additional attributes: context, links, xattr,
                                 all
      --no-preserve=ATTR_LIST  don't preserve the specified attributes
      --parents                use full source file name under DIRECTORY
  -R, -r, --recursive          copy directories recursively
      --reflink[=WHEN]         control clone/CoW copies. See below
      --remove-destination     remove each existing destination file before
                                 attempting to open it (contrast with --force)
      --sparse=WHEN            control creation of sparse files. See below
      --strip-trailing-slashes  remove any trailing slashes from each SOURCE
                                 argument
  -s, --symbolic-link          make symbolic links instead of copying
  -S, --suffix=SUFFIX          override the usual backup suffix
  -t, --target-directory=DIRECTORY  copy all SOURCE arguments into DIRECTORY
  -T, --no-target-directory    treat DEST as a normal file
  -u, --update                 copy only when the SOURCE file is newer
                                 than the destination file or when the
                                 destination file is missing
  -v, --verbose                explain what is being done
  -x, --one-file-system        stay on this file system
  -Z                           set SELinux security context of destination
                                 file to default type
      --context[=CTX]          like -Z, or if CTX is specified then set the
                                 SELinux or SMACK security context to CTX
      --help     display this help and exit
      --version  output version information and exit

By default, sparse SOURCE files are detected by a crude heuristic and the
corresponding DEST file is made sparse as well.  That is the behavior
selected by --sparse=auto.  Specify --sparse=always to create a sparse DEST
file whenever the SOURCE file contains a long enough sequence of zero bytes.
Use --sparse=never to inhibit creation of sparse files.

When --reflink[=always] is specified, perform a lightweight copy, where the
data blocks are copied only when modified.  If this is not possible the copy
fails, or if --reflink=auto is specified, fall back to a standard copy.
Use --reflink=never to ensure a standard copy is performed.

The backup suffix is '~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.
The version control method may be selected via the --backup option or through
the VERSION_CONTROL environment variable.  Here are the values:

  none, off       never make backups (even if --backup is given)
  numbered, t     make numbered backups
  existing, nil   numbered if numbered backups exist, simple otherwise
  simple, never   always make simple backups

As a special case, cp makes a backup of SOURCE when the force and backup
options are given and SOURCE and DEST are the same name for an existing,
regular file.

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Report any translation bugs to <https://translationproject.org/team/>
Full documentation <https://www.gnu.org/software/coreutils/cp>
or available locally via: info '(coreutils) cp invocation'
/csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 # time cp -a /csi-data-dir/24c4a772-a3f7-11eb-bea3-7e7563fe7b85 /csi-data-dir/coreutils-cp
real    0m 0.08s
user    0m 0.00s
sys 0m 0.02s
pohly commented 3 years ago

Looks like a useful enhancement. Care to prepare a PR?

One thing to watch out for is how much larger images become when installing cp from coreutils.

/help

stoneshi-yunify commented 3 years ago

/assign

stoneshi-yunify commented 3 years ago

/remove-help

stoneshi-yunify commented 3 years ago

The tar hostpath used for snapshot/restore doesn't support sparse file either. It will treat the sparse file as regular file, so the process is very slow and the resulting file is usually huge. We may use GNU tar with the --sparse flag instead.

stoneshi-yunify commented 3 years ago

The tar hostpath used for snapshot/restore doesn't support sparse file either. It will treat the sparse file as regular file, so the process is very slow and the resulting file is usually huge. We may use GNU tar with the --sparse flag instead.

The GNU tar may not be adopted as GNU cp. When testing with the sparse file with IO (e.g. a running kubevirt virtual machine), the GNU tar command often returns exit code 1 instead of exit code 0, as described in manpage of GNU tar:

RETURN VALUE
       Tar  exit  code  indicates  whether  it was able to successfully perform the requested operation, and if not, what kind of error
       occurred.

       0      Successful termination.

       1      Some files differ.  If tar was invoked with the --compare (--diff, -d) command line option, this means that some files in
              the  archive  differ  from  their disk counterparts.  If tar was given one of the --create, --append or --update options,
              this exit code means that some files were changed while being archived and so the resulting archive does not contain  the
              exact copy of the file set.

       2      Fatal error.  This means that some fatal, unrecoverable error occurred.

       If a subprocess that had been invoked by tar exited with a nonzero exit code, tar itself exits with that code as well.  This can
       happen, for example, if a compression option (e.g. -z) was used and the external compressor program failed.  Another example  is
       rmt failure during backup to a remote device.

Even we bypassed the the exit code 1 (treat it as success), after snapshot restore (tar -Szf), the resulting sparse file failed to start a kubevirt virtual machine. Using bubybox tar does not have this issue. So there must be some important metadata lost during snapshoting/restoring when using GNU tar.

Consider this reason, I think it's not ready to change the tar to GNU tar.