xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

What does p:file-copy do with special files? #477

Closed ndw closed 4 months ago

ndw commented 3 years ago

What should p:file-copy do with symbolic links, block devices, character devices, etc.?

Is that implementation defined?

Do we expect <p:file-copy href="/dev" target="/tmp/dev"/> to succeed on a Unix box?

Symbolic links are both the more likely problem and the more complicated one. Given:

/tmp/x/copytest/a/fail -> /path/to/nowhere
/tmp/x/copytest/a/file
/tmp/x/copytest/b/a -> ../a
/tmp/x/copytest/b/up -> ../..

What is <p:file-copy href="/tmp/x/copytest" target="/tmp/x/result"/> supposed to produce, assuming /tmp/x/result does not exist before the operation begins?

(Extra credit: consider how this should work on Windows boxes with the various NTFS filesystem features.)

ndw commented 3 years ago

I wonder if the following is sufficient.

  1. If a symbolic link is to an absolute location or a location that does not exist, recreate that symbolic link
  2. If a symbolic link is to a relative location within the structure being copied, recreate it within that structure
  3. If a symbolic link is to a relative location outside the structure being copied, create an absolute symbolic link to that location
  4. Ignore special files
gimsieke commented 3 years ago

For symbolic links it’s maybe sufficient to have a boolean option dereference-links. If it is false(), symbolic links will be kept as they are (verbatim, that is, a relative link will remain the same relative link, an absolute link will remain the same absolute link), no matter whether they pointed anywhere or not, no matter whether the target link points anywhere or not.

If true(), symlinks will be followed and the files/directories they point to will be copied. Processors should report loops as errors when copying with dereferencing on.

If a symlink is dangling and dereference-links is true, it’s an error that can be ignored with fail-on-error="false". In this case, the dangling link won’t be kept as a symlink either, it will rather be missing in the destination.

Hard links: Files they point to will be treated as regular files, as if there were no hard link. I’m not sure whether hard links for other items (directories, block devices, etc.) exist, but I think they should be treated as the items they point to.

Special files (devices etc.): implementation-defined

ndw commented 3 years ago

I don't think hard links are an issue. Those are implemented (at least in Unix) by having different directory entries point to the same inode on disk. I think there's a reference count, but I don't believe there's anything about one entry that can lead you to the other. (Short of reading the whole disk, naturally.)

I suppose we could add a dereference links option. Defaults to...true?

ndw commented 3 years ago

Curiously, a naive application of cp -r with symbolic links that form a loop just blows up...

I'm not recommending that behavior, I'm just saying it's curious.