wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
340 stars 95 forks source link

FileConveyor's inotify FSMonitor breaks when it finds temporary files in it's path. #94

Open j0rd opened 12 years ago

j0rd commented 12 years ago

I'm having issues with fileconveyor getting hung up on vim temporary files. I know I could exclude these from my paths, but ideally the daemon would not crash when a file goes away between the time it finds it and attempts to do something with it.

Better error handling needs to exist, so that the daemon never dies. It this involves wrapping the entire while(1) in a try catch and spitting out errors then restarting, then so be it.

It should also provide useful notification (perhaps via email) that it encountered a problem. Because currently when the server gets hung....essentially your CDN will start to fall behind and your site could end up being totally fubar'd.

j0rd commented 12 years ago

I just ran into this again. It happens when I'm editing a with vim under the file conveyor path

Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.6/threading.py", line 532, in bootstrap_inner self.run() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1356, in run self.loop() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1342, in loop self.process_events() File "/usr/lib/pymodules/python2.6/pyinotify.py", line 1134, in processevents watch.proc_fun(revent) # user processings File "/usr/lib/pymodules/python2.6/pyinotify.py", line 811, in __call return _ProcessEvent.call(self, event) File "/usr/lib/pymodules/python2.6/pyinotify.py", line 544, in call return meth(event) File "/home/www/domain.tld/fileconveyor/wimleers-fileconveyor-d1c55b8/code/fsmonitor_inotify.py", line 233, in process_IN_CREATE self.__update_pathscanner_db(event.pathname, FSMonitor.CREATED) File "/home/www/domain.tld/fileconveyor/wimleers-fileconveyor-d1c55b8/code/fsmonitor_inotify.py", line 205, in __update_pathscanner_db st = os.stat(pathname) OSError: [Errno 2] No such file or directory: '/home/www/domain.tld/htdocs/sites/all/themes/theme_zen/js/.functions.js.swx'

wimleers commented 12 years ago

It should most definitely not crash. It should simply detect the deletion and be done with it.

j0rd commented 12 years ago

To test, open fileconveyor under a path.

Inside that path, open up a file with vim. Save and close the file.

I stopped using fileconveyor because it was much too buggy when I was playing around with it for use on a live server. Perhaps this has been fixed by now, but in my past experience this process I've mentioned was causing a problem.

wimleers commented 12 years ago

I just tested it and I cannot get it to crash:

2012-01-20 07:36:42,865 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.bar.swp' (CREATED).
2012-01-20 07:36:43,271 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/bar' (CREATED).
2012-01-20 07:36:48,290 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.bar.swp' (DELETED).
2012-01-20 07:36:49,307 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/bar' (DELETED).
2012-01-20 07:37:03,937 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.baz.swp' (CREATED).
2012-01-20 07:37:07,363 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/.baz.swp' (DELETED).
2012-01-20 07:37:07,369 - Arbitrator                - WARNING  - Synced: '/Users/wimleers/Desktop/watch/baz' (CREATED).

Either it has been fixed or I cannot reproduce it on OS X.

If you use pip, you can now easily install and uninstall File Conveyor and its dependencies, by doing this:

pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor

Test with something like this: https://gist.github.com/1647418. Then you should be able to either reproduce the problem easily or (hopefully) not.

wimleers commented 11 years ago

Any news, @j0rd?

j0rd commented 11 years ago

No, I stopped using fileconveyor a long time ago, sorry.

wimleers commented 11 years ago

Np :)

ericssmith commented 11 years ago

Just discovered yesterday then rsyncing into a folder 'watched' by fileconveyor causes the problem mentioned above: temporary files cause FileConveyor to crash.

wimleers commented 11 years ago

@ericssmith Is this with 0.3-dev? Temporary files should work just fine. Also see what I said a few comments higher, "I just tested it […]".

chrisivens commented 11 years ago

Just to let you know, I got this issue with vim on an RHEL box too. Was some time ago (3 months ish) but still relevant I think.

ericssmith commented 11 years ago

We are going to repeat the failing procedure and post the results here today.

ericssmith commented 11 years ago

Unfortunately, we don't have time to recreate the error, but here is what the log looked like:

2012-10-30 15:25:01,953 - Arbitrator.FSMonitor - INFO - Detected 'CREATED' event for '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO' through inotify (for monitored path '/var/www/vhosts/mainsite/sites/default'). 2012-10-30 15:25:02,151 - Arbitrator - INFO - Discover queue -> pipeline queue: '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO'. 2012-10-30 15:25:02,161 - Arbitrator - INFO - Pipeline queue -> filter queue: '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO'. 2012-10-30 15:25:02,164 - Arbitrator - INFO - Filtering: dropped '/var/www/vhosts/mainsite/sites/default/files/.pumpkin_bread.jpg.M2B4iO' because it no longer exists.

This is where it died. The error itself is not in the log. Sorry that we won't be able to reproduce it. but will report it if we run into it again. However we are going to set the temp files for rsync to not be in the receiving directory, so that seems unlikely.

(and thanks for your work on this. It goes braindead occasionally, but otherwise works as advertised)

wimleers commented 11 years ago

Hmm … that's very odd. That means it actually died just after dropping a file (i.e. dropping it from the queue of files to deal with). Very odd.

Did you try setting File Conveyor's logging level to DEBUG? That should help.


I know that it's imperfect, and not as reliable as it should be, but with every edge case that I close, we're getting closer to stability :)

j0rd commented 11 years ago

To duplicate the bug. I'd suggest making a small script which creates and deletes random files then unleash it under a fileconveyor path.

If it dies, you've found my bug.

wimleers commented 11 years ago

I actually did just that but couldn't get it to crash at all.

fringedgentian commented 11 years ago

Hello there! I have reproduced this error, but I think it happens when there is a temporary file at one time, but then that temporary file is gone when it attempts to transfer the file.

2012-11-20 14:17:46,826 - Arbitrator - WARNING - Synced: '/var/www/vhosts/xxxxx/sites/default/files/IMG_9738-1.JPG' (DELETED). Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in bootstrap_inner self.run() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1410, in run self.loop() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1396, in loop self.process_events() File "/usr/lib/python2.6/site-packages/pyinotify.py", line 1195, in processevents watch.proc_fun(revent) # user processings File "/usr/lib/python2.6/site-packages/pyinotify.py", line 840, in __call return _ProcessEvent.call(self, event) File "/usr/lib/python2.6/site-packages/pyinotify.py", line 571, in call return meth(event) File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 253, in process_IN_MODIFY self.__update_pathscanner_db(event.pathname, FSMonitor.MODIFIED) File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 205, in __update_pathscanner_db st = os.stat(pathname) OSError: [Errno 2] No such file or directory: '/var/www/vhosts/xxxxx/sites/default/files/.2012-11-12 03.32.06.jpg.PnN0nx'

This happens very intermittently when a file is RSYNC-ed into the path that fileconveyor is monitoring, when that file is gone when it gets to transfering it or something like that.

I have RSYNCed many files into the path and it then picks them up and transfers them fine, it seems to be only this rareish condition that shuts down fileconveyor.

wimleers commented 11 years ago

Aha! Now that is interesting :) Maybe it's happening on a system with a relatively workload, meaning that the inotify events are somewhat lagging behind, in which case this could indeed happen. FSMonitorInotifyProcessEvent.__update_pathscanner_db() should be checking whether the file still exists (os.path.exists(input_file)), much like Arbitrator.__process_filter_queue() does. Could you make that simple change and check whether that fixes the problem?

fringedgentian commented 11 years ago

Hello, it has been a while but I wanted to update you. It turned out that even if I checked if the file exists with os.path.exists(pathname) right before the line os.stat(pathname), it would still happen that a temporary file could be deleted after it had gone past the check but before it got to os.stat, causing an error.

So what I ended up doing was ignoring the error, here is the code if you wanted to improve upon it. This is in fsmonitor_inotify.py:

    (path, filename) = os.path.split(pathname)
    if event_type == FSMonitor.DELETED:
        # Build tuple for deletion of row in PathScanner's DB.
        t = (path, filename)
        self.fsmonitor_ref.pathscanner_files_deleted.append(t)
    else:
        try:
            # Build tuple for PathScanner's DB of the form (path, filename,
            # mtime), with mtime = -1 when it's a directory.
            st = os.stat(pathname)
            is_dir = stat.S_ISDIR(st.st_mode)
            if not is_dir:
                mtime = st[stat.ST_MTIME]
                t = (path, filename, mtime)
            else:
                t = (path, filename, -1)
            # Update PathScanner's DB.
            if event_type == FSMonitor.CREATED:
                self.fsmonitor_ref.pathscanner_files_created.append(t)
            else:
                self.fsmonitor_ref.pathscanner_files_modified.append(t)
        except OSError, e:
            print "Caught a OSError, probably a file doesn't exist"
            print pathname
jstarcher commented 11 years ago

Pull request submitted after testing and verifying this change has solved the issue.

https://github.com/wimleers/fileconveyor/pull/144