wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
341 stars 95 forks source link

Don't wait for generate_missed_events() to finish before starting FSMonitor #69

Closed wimleers closed 13 years ago

wimleers commented 13 years ago

Title says it all. This will prevent a long initial wait until the system is running.

What needs to change to support this? Simply changing the code to first start FSMonitor (either FSMonitorInotify or FSMonitorFSEvents) and then calling FSMonitor.generate_missed_events() is insufficient, because it may cause a file that was created while File Conveyor was not running, but was modified after it was running (and thus this change was detected by either FSMonitorInotify or FSMonitorFSEvents), would cause this file to be processed by File Conveyor first through inotify/FSEvents (as if it were a new file, because it's not yet in the DB with synced files), and then to be overwritten again due to the new event generated by FSMonitor.generate_missed_events().

When implementing this issue, also take #68 into account!

wimleers commented 13 years ago

Related issue: #12.

wimleers commented 13 years ago

Detected & fixed #72 & #73 while working on this.

wimleers commented 13 years ago

Suppose this has been implemented.

Then, imagine the following situation:

This would then need to be mapped to a FSMonitor.CREATED event, and things could then proceed as in the current implementation. However, it is then possible that FSMonitor.generate_missed_events() eventually generates the FSMonitor.CREATED event:

When the event has propagated to PathScanner's fsmonitor.db, then it should not result in a call to PathScanner.update_files(), because that would result in a SQL error (in its current form, PathScanner.update_files() does an UPDATE … query, and if no row exists in the DB yet for this file, that would result in a SQL error — thus we'd either need to call PathScanner.add_files() or change PathScanner.update_files()).

Hence, special care is necessary.

wimleers commented 13 years ago

Oops, I accidentally closed #69 through the commit that fixes this issue: https://github.com/wimleers/fileconveyor/issues/68#commits-ref-771adab.

Commit 771adab fixes this issue!