hpc / mpifileutils

File utilities designed for scalability and performance.
https://hpc.github.io/mpifileutils
BSD 3-Clause "New" or "Revised" License
162 stars 64 forks source link

mfu: add --open-noatime to open files with O_NOATIME #561

Closed adammoody closed 9 months ago

adammoody commented 9 months ago

This adds an --open-noatime option to a number of tools, which adds the O_NOATIME flag when opening files to avoid updating the file last access time.

Many centers use last access time to filter files for purge operations, and they would prefer not to change file atime values when making backup copies with dsync or scanning the file system for duplicate files with ddup. Adding this flag may also improve read performance on some file systems.

The O_NOATIME flag is only allowed when the effective user id matches the owner of the file or when the process is running with the CAP_FOWNER capability. A normal user will encounter errors when using O_NOATIME when reading from a shared directory containing files owned by other users, even if the current user has read access to all files.

The following tools are affected:

ddup - when reading files to compute hash values dcp and dsync - when reading source files during a copy dcmp and dsync - when reading source and destination files while comparing their contents dtar - while reading source files when creating an archive

Wheen --open-noatime is specified with ddup, the tool checks the owner user id of each file and conditionally adds O_NOATIME if the process effective user id matches. This allows normal users to specify the --open-noatime option, even when running ddup on files that they don't own. The atime will be updated on files that the user can read but does not own.

For the remaining tools, the current algorithms do not expose the file owner id in a way to allow for an easy check. In this case, O_NOATIME is added when opening all files. Normal users will thus encounter an error if the tool attempts to open any file that they do not own.

Resolves: https://github.com/hpc/mpifileutils/issues/557 https://github.com/hpc/mpifileutils/pull/534

adammoody commented 9 months ago

@daltonbohning , I hope you're doing well.

We've had some requests to add O_NOATIME to some tools. I've opened this PR to do that. It'd be good to know whether these changes are valid for DAOS system.

Is there someone who can help us check that?

daltonbohning commented 9 months ago

@daltonbohning , I hope you're doing well.

We've had some requests to add O_NOATIME to some tools. I've opened this PR to do that. It'd be good to know whether these changes are valid for DAOS system.

Is there someone who can help us check that?

Hey Adam. Yes, I'm doing well. Hope the same for you!

It looks like DAOS/DFS doesn't respect passing O_NOATIME, though atime is updated. I tested and it doesn't break anything. Extra bits set in flags are just ignored. So this change is safe for us, and I'll create an internal ticket to see if we want to handle that flag when passed.

Thanks for the heads up!

daltonbohning commented 9 months ago

DAOS JIRA for reference: https://daosio.atlassian.net/browse/DAOS-14479

daltonbohning commented 9 months ago

We discussed this in the context of DAOS, and we don't actually store atime with the file. It's only populated in the stat buf to the greater of mtime or ctime. So handling O_NOATIME wouldn't help anything because it would just get "reset" on the next file open.

adammoody commented 9 months ago

TODO: we'll need to be a bit more clever when copying files that are readable but not owned by the user.

From man 2 open:

O_NOATIME (since Linux 2.6.8) Do not update the file last access time (st_atime in the inode) when the file is read(2).

This flag can be employed only if one of the following conditions is true:

  • The effective UID of the process matches the owner UID of the file.
  • The calling process has the CAP_FOWNER capability in its user namespace and the owner UID of the file has a mapping in the namespace.

    This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity. This flag may not be effective on all filesystems. One example is NFS, where the server maintains the access time.

and potential error:

EPERM The O_NOATIME flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged.

adammoody commented 9 months ago

We discussed this in the context of DAOS, and we don't actually store atime with the file. It's only populated in the stat buf to the greater of mtime or ctime. So handling O_NOATIME wouldn't help anything because it would just get "reset" on the next file open.

Thanks, @daltonbohning . And thanks for your super fast response!

adammoody commented 9 months ago

It sounds like tar updates source file atimes by default but one can attempt to preserve atime with an option:

https://www.gnu.org/software/tar/manual/html_section/Attributes.html

When tar reads files, it updates their access times. To avoid this, use the ‘--atime-preserve[=METHOD]’ option, which can either reset the access time retroactively or avoid changing it in the first place.

adammoody commented 9 months ago

For the ability to use O_NOATIME, we do a similar check in mfu_flist_chmod():

https://github.com/hpc/mpifileutils/blob/47918154ea0f4895623f36ccf8cbfe2df477c3ae/src/common/mfu_flist_chmod.c#L1258-L1260

https://github.com/hpc/mpifileutils/blob/47918154ea0f4895623f36ccf8cbfe2df477c3ae/src/common/mfu_flist_chmod.c#L1286-L1296

https://github.com/hpc/mpifileutils/blob/47918154ea0f4895623f36ccf8cbfe2df477c3ae/src/common/mfu_flist_chmod.c#L1069-L1075

TODO: this code doesn't really accomplish what it claims to do. I'll fix that later.

adammoody commented 9 months ago

Apparently, rsync v3.2.0 provides the following options for atime:

https://download.samba.org/pub/rsync/rsync.1

       --atimes, -U
              This  tells  rsync to set the access (use) times of the destina‐
              tion files to the same value as the source files.

              If repeated, it also sets the --open-noatime option,  which  can
              help you to make the sending and receiving systems have the same
              access times on the transferred files  without  needing  to  run
              rsync an extra time after a file is transferred.

              Note  that  some  older rsync versions (prior to 3.2.0) may have
              been built with a pre-release --atimes patch that does not imply
              --open-noatime when this option is repeated.

       --open-noatime
              This  tells rsync to open files with the O_NOATIME flag (on sys‐
              tems that support it) to avoid changing the access time  of  the
              files  that  are being transferred.  If your OS does not support
              the O_NOATIME flag then rsync will silently ignore this  option.
              Note  also  that  some filesystems are mounted to avoid updating
              the atime on read access even without the O_NOATIME  flag  being
              set.

Tip from: https://unix.stackexchange.com/questions/630228/rsync-keep-access-time-atime-how