nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Add a flexible way to handle individual items while copying with sane defaults (stdlib/os) #319

Open rominf opened 3 years ago

rominf commented 3 years ago

Add a flexible way to handle individual items while copying (stdlib/os)

Abstract

I propose to add an additional optional (to keep source-level backward compatibility) argument to copy functions in os module, which will control all aspects of copying items (be they files, dirs, or symlinks). The default value for all these optional arguments should be sane (to follow The Principle of Least Astonishment).

Motivation

The need for copying symlinks

Currently, copyDir do not copy symlinks (see copyDir implementation, case is not exhaustive). This causes problems. For example, I'm developing a wrapper around Guix. Guix comes in a binary form, packed into the archive. I use nimarchive library. It extracts the archive into a temporary directory first, and only then it moves files into the required destination. moveDir fallback to copyDir and this results in a broken Guix distribution, as Guix heavily relies on symlinks.

The need for specifying copying options

Sometimes there is a need to resolve symlinks and copy files/dirs themselves instead of copying symlinks. Examples include copying dir with symlinks (that point to the files/dirs that are not copied) to another computer.

Sometimes there is a need to create hardlinks instead of actual files. Examples include creating a version control system (see Mercurial wiki).

Another problem that arises with copying is "How to handle conflicts?" (in other words, what to do when the file already exists in destination dir). Options include: report an error, keep the existing file, without reporting an error, replace the existing file, or replace the existing file only if it is older than the file being copied.

Description

There was already an attempt to change os module copy/move functions, see https://github.com/nim-lang/Nim/pull/16709.

For the summary on how do different programming languages handle symlinks on dir copy, see https://github.com/nim-lang/Nim/pull/16709#issuecomment-761132187. I think we can copy good design from C++ filesystem::copy (see copy_options) and from Go copy library.

I propose to define copyFile, copyDir as follows:


type SymlinkAction* = enum ## Action to perform on symlink while copying.
  saFollow,        ## Copy the files symlinks point to
  saCopyAsIs,      ## Copy symlinks as symlinks
  saSkip           ## Ignore symlinks

type ExistingFileAction* = enum ## Action to perform on existing file while copying.
  efaError,              ## Report an error
  efaSkipExisting,       ## Keep the existing file, without reporting an error
  efaOverwriteExisting,  ## Replace the existing file
  efaUpdateExisting      ## Replace the existing file only if it is older than the file being copied 

type CopyItemOptions* = object ## Options controlling the files and symlinks copying.
  existingFileAction*: ExistingFileAction
  recursive*: bool  ## Recursively copy subdirectories and their content, used only for dir-like items
  symlinkAction*: SymlinkAction
  # Could be expanded in the future

proc copyFile*(source, dest: string, options = CopyItemOptions(existingFileAction: efaError, recursive: true, symlinkAction: saFollow)) =
  ... # Implentation is omitted, see previous attempt: https://github.com/nim-lang/Nim/pull/16709

type CopyOptionsCallback* = proc(source, dest: string, path: string, kind: PathComponent): CopyItemOptions {.closure.}

type CopyFunction* = proc(source, dest: string, path: string, kind: PathComponent, options: CopyOptions, itemOptions: CopyItemOptions) {.closure.}

type CopyOptions* = object ## Options controlling the dirs copying.
  copyOptionsCallback*: CopyOptionsCallback
  copyFunction*: CopyFunction
  # Could be expanded in the future

proc copyOptionsCallbackDefault*(source, dest: string, path: string, kind: PathComponent): CopyItemOptions {.closure.} =
  CopyItemOptions(existingFileAction: efaError, recursive: true, symlinkAction: saCopyAsIs)

proc copyFunctionDefault*(source, dest: string, path: string, kind: PathComponent, options: CopyOptions, itemOptions: CopyItemOptions) {.closure.} =
  if kind == pcDir:
    copyDir(source = source / path, dest = dest / path, options = options)
  else:
    copyFile(source = source / path, dest = dest / path, options = itemOptions)

proc copyDir*(source, dest: string, options = CopyOptions(copyOptionsCallback: copyOptionsCallbackDefault, copyFunction: copyFunctionDefault)) =
  createDir(dest)
  for kind, path in walkDir(source):
    var noSource = splitPath(path).tail
    let itemOptions = options.copyOptionsCallback(source = source, dest = dest, path = path, kind = kind)
    if kind == pcLinkToDir and itemOptions.symlinkAction == saFollow:
      let kind = pcDir
    if kind == pcDir and not itemOptions.recursive:
        createDir(dest / noSource)
    else:
        options.copyFunction(source = source, dest = dest, path = noSource, kind = kind, options = options, itemOptions = itemOptions)

Examples

Before

# 1
[copying symlinks as symlinks was not possible, should call createSymlink]

# 2
[not possible without manual loop with `walkDir`]

# 3
[not possible without manual loop with `walkDir`]

After

# 1
copyFile("/usr/bin/vi", "/home/rominf/bin/vi", CopyItemOptions(symlinkAction: saCopyAsIs))

# 2
proc createHardlinks(source, dest: string, path: string, kind: PathComponent, options: CopyOptions, itemOptions: CopyItemOptions) {.closure.} =
  if kind == pcFile:
    createHardlink(source / path, dest / path)
  else:
    options.copyFunction(source = source, dest = dest, path = path, kind = kind, options = options, itemOptions = itemOptions)

copyDir("a", "b", CopyOptions(copyOptionsCallback: copyOptionsCallbackDefault, copyFunction: createHardlinks))

# 3 
proc updatePhotos(source, dest: string, path: string, kind: PathComponent): CopyItemOptions {.closure.} =
  result = CopyItemOptions(existingFileAction: efaError, recursive: true, symlinkAction: saCopyAsIs)
  if kind == pcFile and path.lastPathPart.endsWith(".jpg"):
    result.existingFileAction = efaUpdateExisting

copyDir("data", "dataArchive", CopyOptions(copyOptionsCallback: updatePhotos, copyFunction: copyFunctionDefault))

Backward incompatibility

This RFC changes the default behavior of copy functions: symlinks are copied now. In my opinion, we can assume that not copying them was a bug, so no additional actions are required.

Araq commented 3 years ago

In order to fix copyDir, we only have to fix copyDir...

timotheecour commented 3 years ago

In order to fix copyDir, we only have to fix copyDir...

can we at least agree on the proposal I wrote in https://github.com/nim-lang/Nim/pull/16709#issuecomment-762411231 to unblock that PR? (ie, allowing copyFile(src, dst, CopyOptions(symlinkAction: saCopyAsIs)) and making copyDir call saCopyAsIs so that symlinks are copied as symlinks instead of being ignored).

We can address remaining points later, but I don't see anything controversial with that proposal

Araq commented 3 years ago

that comment isn't helpful as you're not specifying how copyDir should be fixed.

It should copy symlinks as symlinks.

can we at least agree on the proposal I wrote in nim-lang/Nim#16709 (comment) to unblock that PR? (ie, allowing copyFile(src, dst, CopyOptions(symlinkAction: saCopyAsIs)) and making copyDir call saCopyAsIs so that symlinks are copied as symlinks instead of being ignored).

No, because the CopyOptions(symlinkAction: saCopyAsIs) part is controversial, I'd rather have a set[Flag] optional parameter instead.

timotheecour commented 3 years ago

No, because the CopyOptions(symlinkAction: saCopyAsIs) part is controversial, I'd rather have a set[Flag] optional parameter instead.

how would you use set[Flag]? this is under-specified so I can only guess:

type CopyFlag* = enum
  cfSymlinkAsIs
  cfSymlinkFollow
  cfSymlinkIgnore # maybe
proc copyFile*(src, dst: string, options = {cfSymlinkFollow})
proc copyDir*(src, dst: string, options = {cfSymlinkAsIs}) # use {cfSymlinkIgnore} for previous behavior

this API can be misused (proc copyFile*(src, dst, options = {cfSymlinkFollow, cfSymlinkAsIs}) makes no sense), whereas copyFile(src, dst: string, options = CopyOptions(symlinkAction: saCopyAsIs)) could not be similarly mis-used.

so instead this would have to be without set:

proc copyFile*(src, dst: string, symlinkOpt = cfSymlinkFollow)
proc copyDir*(src, dst: string, symlinkOpt = cfSymlinkAsIs)

if so, future additions would have to use a different optional param, whereas copyFile(src, dst, CopyOptions(symlinkAction: saCopyAsIs)) could reuse the same one by extending CopyOptions.

I'm ok with either CopyOptions(symlinkAction: saCopyAsIs) or the non-set version, but the set[Flag] option is worse IMO because it allows invalid states (handling these with assert/exceptions is worse than not allowing them in the 1st place)

Araq commented 3 years ago

so instead this would have to be without set:

In this case, yes.

if so, future additions would have to use a different optional param, whereas copyFile(src, dst, CopyOptions(symlinkAction: saCopyAsIs)) could reuse the same one by extending CopyOptions.

I know, but I prefer to add more optional parameters to the proc directly rather than via "builder" objects which is a workaround from other languages for named parameters.

Also, I don't want too many options for copyDir, the idea is that os gives you the common case in an easy to use fashion and the primitives (Iterate over the directory) so that you can roll your own for the uncommon cases.

Otherwise soon enough every operation grows an onProgress callback type just so that you can use copyDir in a UI setting where you want to drive a progress bar -- or maybe not because other languages don't have it so it never occured to us that it would be a useful thing...

timotheecour commented 3 years ago

Also, I don't want too many options for copyDir

we're on the same page then, see my comment in https://github.com/nim-lang/Nim/pull/16709#issuecomment-761858948

What if we use a callback? seems the easiest to customize for end users, instead of having a bunch of options (eg only follow for pcLinkToFile, not pcLinkToDir, or only follow if target of symlink is external, or if it doesn't create a cycle, or only copy at most N files etc...)

I've argued there for a single optional param proc copyDir*(source, dest: string, callback: CopyDirCallback = copyDirCallbackDefault) which gives user the same sane default we've agreed to (copy symlinks as symlinks), yet also allows for almost arbitrary user customizations without having to rewrite copyDir from scratch for end users.

The way this callback is used makes it simple for users to customize: all the callback must do is:

this should cover 99% of use cases (including, yes, a progress, early stopping after N files, a policy for symlinks, skipping hidden folders etc; all this is simply up to user code). For the remaining cases (eg if you need copying files in a certain order), user would need to call walkPaths (https://github.com/nim-lang/fusion/pull/32) directly, eg as shown in https://github.com/nim-lang/Nim/pull/16709#issuecomment-762411231

But we can discuss whether to add an optional param in future work, for now simply providing proc copyDir*(source, dest: string) that copies symlinks as symlinks would be progress

Araq commented 3 years ago

for now simply providing proc copyDir*(source, dest: string) that copies symlinks as symlinks would be progress

Exactly.

rominf commented 3 years ago

OK, I've started the development of what @Araq wants :-)