Open voron opened 13 years ago
Hi Alex,
Indeed, get_wd()
as stated in the documentation is notoriously inefficient, it is encouraged instead of keeping track of WDs whenever possible. Although your patch makes perfect sense my main objection though is that it would increase the (already high) memory footprint. But I've not completely made up my mind yet, maybe if we indexed each wd
by sha1(path)
instead of path
it could further limit the additional amount of memory needed.
+1 for indexing by sha1(path). Speaking from experience this "normalizes" the footprint for indexing wd
as you have now fixed the length (assuming that the average performance of sha1(path) is negligible).
Here is an updated patch with sha1(path) http://vorona.com.ua/pyinotify/pyinotify.py.2.patch and additional indexing for directory delete, so it is now fast too. Same test gaves 12 seconds instead of 57 with original patch user 0m11.753s sys 0m0.520s
~ 24x faster on this syntetic test.
@voron I've updated your patch to use hashlib instead of the depreciated sha: https://gist.github.com/1048041
@wayoutmind pyinotify supports python 2.4+, while hashlib is only from 2.5. Of course hashlib can be added via OS packages or easy_install, but ... BTW why did You used hexdigest()
instead digest()
? In my tests pyinotify with hexdigest() is a little slower and consumes more RAM comparing to digest().
@voron, Excellent point, if we wish to maintain 2.4+ compatibility then sha would be the de facto choice. For some reason at the time I thought sha.digest() was equivalent to hashlib.hexdigest(), which I'm used to using for non-binary applications. Anyway, don't mind my carelessness, carry on!
So, what happened? Was a proper solution found? Seems this is very relevant to my issue of setting up watches on lots of dirs: https://github.com/seb-m/pyinotify/issues/36
I'm quite interested to know if this was ever fixed.
Hello,
I had installed pyinotify 0.9.2 via easy_install, took tutorial_notifier.py, enabled auto_add and started to test. uname -r 2.6.38-8-generic
making tmpfs to not touch disks
mkdir /tmp/tin;mount -t tmpfs -o size=20000000,mode=1777,nr_inodes=200000,noatime,nodiratime none /tmp/tin
making queue large
echo 1000000 > /proc/sys/fs/inotify/max_queued_events
starting notifier
time python ./tutorial_notifier.py > tmp.log
starting tmpflood, code http://vorona.com.ua/pyinotify/tmpflood2.py
creating 25000 dirs under /tmp/tin and then removing them
./tmpflood2.py Creating dirs ... Dirs created. Wait till notifier stops eating CPU and press Enter to removing
waiting ~ 5 minutes till notifier process all events, noted cpu time 4:45 and pressing Enter
Removing dirs ...
waiting a couple seconds till notifier process all events
Then pressing Ctrl+C in notifier console and got CPU time
user 4m50.130s sys 0m1.948s
So, I got 4:45 for processing creation and 5 secs for processing deletion.
After investigation I had found main slowdown in get_wd(). So, I simply added one more dictionary "path ->watch descriptor" to speedup this lookup - patch http://vorona.com.ua/pyinotify/pyinotify.py.patch After that I run test again and got following results
user 0m57.088s sys 0m0.660s I got 5 secs for processing creation and 52 secs for processing deletion. Total speedup with patch was ~5x and can vary depending on dirs names, dirs count etc. In my tests I saw 10x speedup and more. Directory creation speedup ~57x, directory deletion slowdown ~10x. But there is no deletion without creation, so I do not see any drawbacks.
Regards, Alex