seb-m / pyinotify

Monitoring filesystems events with inotify on Linux.
http://github.com/seb-m/pyinotify/wiki
MIT License
2.29k stars 379 forks source link

Degraded performance on large watch set #15

Open voron opened 13 years ago

voron commented 13 years ago

Hello,

I had installed pyinotify 0.9.2 via easy_install, took tutorial_notifier.py, enabled auto_add and started to test. uname -r 2.6.38-8-generic

making tmpfs to not touch disks

mkdir /tmp/tin;mount -t tmpfs -o size=20000000,mode=1777,nr_inodes=200000,noatime,nodiratime none /tmp/tin

making queue large

echo 1000000 > /proc/sys/fs/inotify/max_queued_events

starting notifier

time python ./tutorial_notifier.py > tmp.log

starting tmpflood, code http://vorona.com.ua/pyinotify/tmpflood2.py

creating 25000 dirs under /tmp/tin and then removing them

./tmpflood2.py Creating dirs ... Dirs created. Wait till notifier stops eating CPU and press Enter to removing

waiting ~ 5 minutes till notifier process all events, noted cpu time 4:45 and pressing Enter

Removing dirs ...

waiting a couple seconds till notifier process all events

Then pressing Ctrl+C in notifier console and got CPU time

user 4m50.130s sys 0m1.948s

So, I got 4:45 for processing creation and 5 secs for processing deletion.

After investigation I had found main slowdown in get_wd(). So, I simply added one more dictionary "path ->watch descriptor" to speedup this lookup - patch http://vorona.com.ua/pyinotify/pyinotify.py.patch After that I run test again and got following results

user 0m57.088s sys 0m0.660s I got 5 secs for processing creation and 52 secs for processing deletion. Total speedup with patch was ~5x and can vary depending on dirs names, dirs count etc. In my tests I saw 10x speedup and more. Directory creation speedup ~57x, directory deletion slowdown ~10x. But there is no deletion without creation, so I do not see any drawbacks.

Regards, Alex

seb-m commented 13 years ago

Hi Alex,

Indeed, get_wd() as stated in the documentation is notoriously inefficient, it is encouraged instead of keeping track of WDs whenever possible. Although your patch makes perfect sense my main objection though is that it would increase the (already high) memory footprint. But I've not completely made up my mind yet, maybe if we indexed each wd by sha1(path) instead of path it could further limit the additional amount of memory needed.

fredrick commented 13 years ago

+1 for indexing by sha1(path). Speaking from experience this "normalizes" the footprint for indexing wd as you have now fixed the length (assuming that the average performance of sha1(path) is negligible).

voron commented 13 years ago

Here is an updated patch with sha1(path) http://vorona.com.ua/pyinotify/pyinotify.py.2.patch and additional indexing for directory delete, so it is now fast too. Same test gaves 12 seconds instead of 57 with original patch user 0m11.753s sys 0m0.520s

~ 24x faster on this syntetic test.

fredrick commented 13 years ago

@voron I've updated your patch to use hashlib instead of the depreciated sha: https://gist.github.com/1048041

voron commented 13 years ago

@wayoutmind pyinotify supports python 2.4+, while hashlib is only from 2.5. Of course hashlib can be added via OS packages or easy_install, but ... BTW why did You used hexdigest() instead digest()? In my tests pyinotify with hexdigest() is a little slower and consumes more RAM comparing to digest().

fredrick commented 13 years ago

@voron, Excellent point, if we wish to maintain 2.4+ compatibility then sha would be the de facto choice. For some reason at the time I thought sha.digest() was equivalent to hashlib.hexdigest(), which I'm used to using for non-binary applications. Anyway, don't mind my carelessness, carry on!

Shne commented 12 years ago

So, what happened? Was a proper solution found? Seems this is very relevant to my issue of setting up watches on lots of dirs: https://github.com/seb-m/pyinotify/issues/36

edv4rd0 commented 11 years ago

I'm quite interested to know if this was ever fixed.