wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
341 stars 95 forks source link

WordPress caching plugin crashes and halts fileconveyor #143

Open halburgiss opened 11 years ago

halburgiss commented 11 years ago

Common WordPress plugin WP super cache dynamically creates static page caches. By default these are in the WordPress /wp-content folder. A dynamic folder structure is created based on URLs of the site being cached. Sporadically, creation of files within the folder structure crash fileconveyor. Once that happens, the sync process is halted until restarted. Also, the parent directory of the cache structure is excluded in config.xml by ignoreDirs. Either this exclusion isn't being extended to subdirectories, or the dynamic creation of subdirectories is an issue.

2013-04-19 09:44:05,503 - Arbitrator                - WARNING  - Fully up and running now.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/pyinotify.py", line 1511, in run
    self.loop()
  File "/usr/local/lib/python2.7/dist-packages/pyinotify.py", line 1497, in loop
    self.process_events()
  File "/usr/local/lib/python2.7/dist-packages/pyinotify.py", line 1293, in process_events
    watch_.proc_fun(revent)  # user processings
  File "/usr/local/lib/python2.7/dist-packages/pyinotify.py", line 937, in __call__
    return _ProcessEvent.__call__(self, event)
  File "/usr/local/lib/python2.7/dist-packages/pyinotify.py", line 662, in __call__
    return meth(event)
  File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 266, in process_IN_MODIFY
    self.__update_pathscanner_db(event.pathname, FSMonitor.MODIFIED)
  File "/usr/local/bin/fileconveyor/fsmonitor_inotify.py", line 211, in __update_pathscanner_db
    st = os.stat(pathname)
OSError: [Errno 2] No such file or directory: '/home/clients/example.com/htdocs/wp-content/cache/supercache/www.example.com/blog/2012/02/08/mail-lists-to-buy-or-not/120869193251714a2acca075.98801290.tmp
wimleers commented 11 years ago

File Conveyor should automatically restart after crashes. What version of File Conveyor are you using?

halburgiss commented 11 years ago

The changelog lists most recent version as 0.4-dev (I got it via github on April 12).

The vim issue seem to be a similar situation.

I just tried a quick test:

WP super cache is less predictable triggering an error. It does not always happen. The vim error though is 100% consistent (for me).

Thanks.

wimleers commented 11 years ago

Thanks. Definitely not an old version — that can't be the cause then!


On deleting original cache files: that's not a problem. As of https://github.com/wimleers/fileconveyor/commit/081fb8c9db92b52501fbeeb0fde109d68511bb0f (see the changes to fsmonitor.py), it implements event merging. Meaning that if file a.txt is deleted, then recreated before it is processed by File Conveyor, then it will be marked as modified instead.

On "no activity related to these files": I doubt that. On what grounds do you claim "no activity"? I bet it was simply still deleting files that had not yet been recreated. I bet File Conveyor was simply obeying the order of events: deleting files from the CDN that no longer existed on the origin. It would get to those newly added or modified files. Did you enable debug level logging? That would show you each event, even just detections that are yet to be processed.

halburgiss commented 11 years ago

re: "no activity" meaning there was no visible output from fileconveyor to the console related to these files. When I later killed and restarted, I got the 'WARNING - Synced' (IIRC) message for both of those files. I don't think the deletion process was still ongoing when I killed/restarted (but can't say for sure at this point). I do remember as soon as fileconveyor was restarted, it immediately detected and synced those 2 files. Would help to try this again with a different debugging level?

Related to this is that all this is happening in directories I prefer don't even sync to start with. They live in /wp-content/cache/supercache/example.com/*. There will be a significant directory structure created below that path. I have ignoredDirs="wp-admin:cache:etc:etc". Does that path need to be the full relative path to If I could this directory path excluded, that would totally work for me as a solution for this use case? Thanks.