Closed markmoe19 closed 1 year ago
That's a good idea, and it's related to the O_NOATIME
request in https://github.com/hpc/mpifileutils/pull/534 for reading files in dcp/dsync, which can be used to avoid modifying the atime value.
As a test, can you check whether adding the O_NOATIME
flag to the open call in ddup
works:
so that this line changes to
mfu_open(fname, O_RDONLY | O_NOATIME)
As noted in https://github.com/hpc/mpifileutils/pull/534, in addition to keeping atime the same, that could also improve read performance. It cuts out a bunch of atime updates going to the Lustre metadata server.
I like this as a general improvement for ddup, but we I'll need to verify that this doesn't create problems for file systems that might not support the O_NOATIME
flag. If nothing else, we could enable the flag by default and add a new command line option to drop it.
I got the below error. Looks like src/dcp1/compare.c has O_NOATIME in some commented out code, maybe it was considered there as well at one time?
/project/selene-admin/mpifileutils_debug/mpifileutils-v0.11.1/mpifileutils/src/ddup/ddup.c:102:41: error: ‘O_NOATIME’ undeclared (first use in this function)
102 | int fd = mfu_open(fname, O_RDONLY | O_NOATIME);
| ^~~~~
Oh, we probably also need to add a #define _GNU_SOURCE
statement before any includes in order to pick up the definition for O_NOATIME
. Can you try again after adding #define _GNU_SOURCE
to the very top of the ddup.c
file?
Ok, great, I got this to compile now, just need to test ...
Does dsync change atime? This made we think that O_NOATIME would be nice to ddup, dsync, etc. tools as we don't want to change atime or mtime while looking for old files to clean-up or move. :)
Yes, other tools current change atime, as well. After we confirm that this is working in ddup, I'll work to add O_NOATIME to dcmp, dcp, and dsync. dtar would be another potential target.
It works nicely. Atime is not changed and duplicate files are found, see attached snippet of text showing "ls -alu" output for atime. (I did cut out some of the file list to trim down the snippet size). Thanks!
Ok, good. Thanks for testing. I'll make that O_NOATMIE
change to ddup and start looking at the other tools.
some adventurers in atime:
Looks like newer versions of rsync can preserve atime on both source and target, which seems ideal for our use case. I think that would be a nice goal for dsync as well. :)
https://unix.stackexchange.com/questions/630228/rsync-keep-access-time-atime-how
Since rsync version 3.2.0, there are two flags that affect atimes:
--atimes, -U preserve access (use) times --open-noatime avoid changing the atime on opened files The full description of these is:
--atimes, -U
This tells rsync to set the access (use) times of the destina‐
tion files to the same value as the source files.
If repeated, it also sets the --open-noatime option, which can
help you to make the sending and receiving systems have the same
access times on the transferred files without needing to run
rsync an extra time after a file is transferred.
Note that some older rsync versions (prior to 3.2.0) may have
been built with a pre-release --atimes patch that does not imply
--open-noatime when this option is repeated.
--open-noatime
This tells rsync to open files with the O_NOATIME flag (on sys‐
tems that support it) to avoid changing the access time of the
files that are being transferred. If your OS does not support
the O_NOATIME flag then rsync will silently ignore this option.
Note also that some filesystems are mounted to avoid updating
the atime on read access even without the O_NOATIME flag being
set.
So my reading of these (which testing has borne out) is that the following will both keep rsync from updating the atime on src, and will copy the atime of src to dest:
rsync -UU
Thanks, @markmoe19 . I just merged https://github.com/hpc/mpifileutils/pull/561 to add a new --open-noatime
option to various tools to enable this.
Since adding O_NOATIME
can lead to an error for normal users when reading files they don't own, enabling it via a new option is a good way to go.
With this, dsync --open-noatime
should avoid updating atime on source files. And as before, by default, dsync
currently copies atime from source to destination files when it set the destination timestamps, so that the destination atime should match the source.
--open-noatime option sounds great, thanks @adammoody !
Great. I'll close this one out as resolved by https://github.com/hpc/mpifileutils/pull/561
Could ddup be enhanced for allowing a restore of last accessed time (atime) on a file after it is read for de-duplication comparison purposes? We find atime useful for alerting users to unused files to clean-up. We would also like to use ddup but it changes atime.