Closed azraelxyz closed 6 years ago
Hi,
I tested the same a few years back, and for local filesystems this made things actually slower - as there's no bulk operation syscall, taking and releasing the GIL actually slowed down (non-trivially) the speed of large filesystem scans/changes.
Have you actually benchmarked what effect this has?
Hi iustin,
I did some micro benchmark in my environment. It shows non-trivial performance impact before and after this commit as you mentioned above. Would you please kindly check it in your environment?
Here are my configuration and test results in seconds of setxattr, listxattr, getxattr, removexattr for 100k files.
MacBook Pro (13-inch, Mid 2012) Processor: 2.5GHz Intel Core i5 Memory: 16 GB 1600 MHz DDR3 Disk: Intel Solid State Drive 545s Series 256GB OS: OS X El Capitan 10.11.3
Before commit 3179f0f setxattr: 3.78645086288 listxattr: 1.78046584129 getxattr: 1.13265991211 removexattr: 4.69510889053
After commit 3179f0f setxattr: 3.68556499481 listxattr: 1.76518297195 getxattr: 1.35534405708 removexattr: 4.56970596313
My benchmark script is available here.
# mkdir /tmp/pyxattr; python ./xattr_bench.py /tmp/pyxattr/ 100000
import xattr
import sys
import time
import os
path = sys.argv[1]
num = int(sys.argv[2])
key = 'user.testxattr'
for i in range(num):
filepath = os.path.join(path, str(i))
with open(filepath, 'w') as fp:
fp.write('x')
start = time.time()
for i in range(num):
filepath = os.path.join(path, str(i))
xattr.setxattr(filepath, key, 'test')
print 'setxattr:', time.time() - start
start = time.time()
for i in range(num):
filepath = os.path.join(path, str(i))
xattr.listxattr(filepath)
print 'listxattr:', time.time() - start
start = time.time()
for i in range(num):
filepath = os.path.join(path, str(i))
xattr.getxattr(filepath, key)
print 'getxattr:', time.time() - start
start = time.time()
for i in range(num):
filepath = os.path.join(path, str(i))
xattr.removexattr(filepath, key)
print 'removexattr:', time.time() - start
BTW, current pyxattr implementation is working perfectly on local file system such as ext4. However, on some file systems which are over network, it might be worthy to consider if we should release GIL when doing xattr operations.
FYI, this is what I get with a micro-benchmark in place (all numbers are microseconds), using Python 2.7 and multiple invocations to find best numbers - built-in variability is high:
Python 2.7:
Overall, the run-to-run variability (CPU scheduler, etc.) is higher or on the same order as before/after, so I think this can go in as is, irrespective of local-vs-remote filesystem. It could be possible that even on local filesystem with slow I/O, the gains would justify the trivial CPU increase.
Released in version 0.6.1.
Release GIL in _list_obj, _get_obj, _set_obj, and _remove_obj. These functions do I/O operations which take long time. We should release GIL to allow other threads work.