jborg / attic

Deduplicating backup program
Other
1.11k stars 104 forks source link

options to not compare inode numbers, and disable acl / xattr #89

Open ghost opened 10 years ago

ghost commented 10 years ago

I was thinking about this the other day when looking at the code - and the comment at end of #88 reminded me:

I think it would be worthwhile to have options to disable inode checking, as well as acl and xattr support.

regarding inodes (from rdiff-backup docs)

       --no-compare-inode
              This option prevents rdiff-backup  from  flagging  a  hardlinked
              file  as  changed  when  its device number and/or inode changes.
              This option is useful in situations where the source  filesystem
              lacks  persistent  device  and/or inode numbering.  For example,
              network filesystems may have mount-to-mount differences in their
              device  number  (but  possibly  stable  inode numbers); USB/1394
              devices may come up at different  device  numbers  each  remount
              (but  would  generally  have  same  inode number); and there are
              filesystems which don't even have the same  inode  numbers  from
              use to use.  Without the option rdiff-backup may generate unnec‐
              essary numbers of tiny diff files.

An option to disable xattr and acl support could also potentially help with performance. Maybe by a unnoticeable fraction even over a large backup, but when checking the file cache it would remove the need for 3 additional system calls on top of the lstat. Tools like rdiff-backup (which I was using before and am trialling attic currently) support this.

jborg commented 10 years ago

An option to disable xattr and acl support could also potentially help with performance. Maybe by a unnoticeable fraction even over a large backup, but when checking the file cache it would remove the need for 3 additional system calls on top of the lstat. Tools like rdiff-backup (which I was using before and am trialling attic currently) support this.

I might consider that if the performance impact can be shown to be large enough. Are you able to provide some initial benchmarks? (Manually disabling acl an xattr should be fairly easy)

ghost commented 10 years ago

I did some synthetic benchmarks traversing a directory and performing it with and without xattr/get_acl, and I would admit it makes little impact on this machine (which has relatively fast cpu/io) - although I need to test on a slower one with a larger data set. Also if a low priority background task on a server with other stuff doing io, it might make more of a difference.

walking through a structure of ~360k files with just lstat

# echo 3 > /proc/sys/vm/drop_caches; python3 test.py
files= 363291 time=  73.48496890068054

and with xattr_getall and acl_get also

# echo 3 > /proc/sys/vm/drop_caches; python3 test.py
files= 363291 time=  80.05171489715576
import xattr, os, time
from attic.platform import acl_get, acl_set

start = time.time()
path = '/something/';

def dostat(file):
    item = {}
    st = os.lstat(file)
    xattrs = xattr.get_all(file, follow_symlinks=False)
    acl = acl_get(file, item, 0)

c = 0
for dirname, dirnames, filenames in os.walk(path):
    for filename in filenames:
        file = os.path.join(dirname, filename)
        dostat(file)
        c+= 1

now = time.time() - start
print("files=", c, "time= ", now)

I in no way consider this a good/worthy benchmark ;-)

ghost commented 10 years ago

And what were your thoughts regarding the inode option ?

jborg commented 10 years ago

That option might be useful is some situations I guess, could you give some more details on why you would need this?

ghost commented 10 years ago

I don't actually need it currently, but I wanted to mention it is offered by other solutions, and if someone wanted to backup a network filesystem that has non static inodes it would be needed for the cache to be any use.

it would theoretically allow the cache to be rebuilt from a remote repository too if a user did not want to use inode comparison (as you said the fact that inodes are not stored in the repository is why this would not be possible to implement)

in regards to the xattr/get_acl I just think it's always a good idea to reduce overhead per file if possible (and if the user doesn't need them, why make an additional 3 million calls etc)

ThomasWaldmann commented 9 years ago

In PR #235 there is a dummy xattr and acl implementation now (used for unsupported platforms right now).