hpc / mpifileutils

File utilities designed for scalability and performance.
https://hpc.github.io/mpifileutils
BSD 3-Clause "New" or "Revised" License
162 stars 64 forks source link

mfu_param_path: fix cutting root component #536

Closed daltonbohning closed 1 year ago

daltonbohning commented 1 year ago

Even when ! copy_into_dir, we still still shouldn't cut the root component.

Signed-off-by: Dalton Bohning dalton.bohning@intel.com

adammoody commented 1 year ago

Thanks, @daltonbohning and @wiliamhuang . To help me work through the cases, would you mind also describing some example usage scenarios that this is fixing?

daltonbohning commented 1 year ago

To help me work through the cases, would you mind also describing some example usage scenarios that this is fixing?

@adammoody You probably wouldn't run into it with POSIX/Lustre unless you're copying the root filesystem and the root has files and directories, so I'll use a DAOS example:

dcp daos://pool/container1 daos://pool/container2/new_dir

mfu_file_t will store the type for the source as DAOS/DFS, but the "path" is technically "/" with respect to the container root. So we're effectively doing this:

dcp / /path/to/new_dir

If container1 has this structure:

daos://pool/container1/my_dir  # /my_dir
daos://pool/container1/my_file # /my_file

The simplified flow is something like:

mkdir daos://pool/container2/new_dir
cp daos://pool/container1/my_dir daos://pool/container2/new_dir
cp daos://pool/container1/my_file daos://pool/container2/new_dir
# Error - notice the path is wrong

Hence, we --cut if the source is root. The logic previously did this only when copy_into_dir (added by me/Danielle), but whether we're copying INTO or TO, we shouldn't cut the root.

daltonbohning commented 1 year ago

@adammoody Does this make sense?

adammoody commented 1 year ago

Yes, thanks. I wanted to review it again with that kind of example in mind, but I got pulled on to something else. I'll get back to it today.

adammoody commented 1 year ago

@daltonbohning , I finally found some time to dig into this. Sorry for the delay.

In your example above, it seems like the existing code would work if the destination was identified as a directory from this check:

https://github.com/hpc/mpifileutils/blob/6230ed293c69f818aac5a47a0ce3665fc3fafba6/src/common/mfu_param_path.c#L512-L520

Can you verify whether that is true if the destination container already exists?

I do see the problem if the destination container does not already exist. The copy_into_dest flag will not be set since the list of source paths is a single item ("/"). In this case, the user wants dcp to copy all children of the given directory ("/") rather than the directory itself.

Some have asked for this kind of feature to apply to all directories, so we may even want to generalize this. For example, from the rsync man page, a trailing slash on a source path means to copy the "contents of the directory" rather than to copy the "directory" itself.

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the attributes of the containing directory are transferred to the containing directory on the destination.

We could delay that generalization to later, just thinking out loud ...

daltonbohning commented 1 year ago

Can you verify whether that is true if the destination container already exists?

@adammoody It doesn't seem to matter if the destination container exists, but rather if the destination directory (DAOS or POSIX) does.

1. dcp daos://pool/cont/ daos://pool/new_cont
# GOOD

2. dcp daos://pool/cont/ daos://pool/new_cont/new_dir
ERROR: Failed to copy `/file' to `/new_dir

3. dcp daos://pool/cont/ daos://pool/existing_cont/new_dir
ERROR: Failed to copy `/file' to `/new_dir

4. dcp daos://pool/cont/ new_posix_dir
ERROR: Failed to copy `/file' to `/tmp/dbohning/new_posix_dir

5. dcp daos://pool/cont/ existing_posix_dir
# GOOD
daltonbohning commented 1 year ago

We could delay that generalization to later, just thinking out loud ...

I'm not sure if we should or shouldn't do this :)

adammoody commented 1 year ago

We could delay that generalization to later, just thinking out loud ...

I'm not sure if we should or shouldn't do this :)

Let's not worry about generalizing here, since it would change the user interface on people.

adammoody commented 1 year ago

Thanks, @daltonbohning . Looks good.

daltonbohning commented 1 year ago

Thanks @adammoody !