wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
341 stars 95 forks source link

/tmp/daemon consumes 100% of available disk space when files are updated frequently #30

Closed tedfordgif closed 13 years ago

tedfordgif commented 13 years ago

I was surprised when /tmp/daemon filled up my disk partition (causing mysql to stop responding). This happened because my client was uploading new videos to a folder being synced by fileconveyor. I configured fileconveyor with unique_filename.Mtime, so every time the FTP daemon wrote new data to the file, fileconveyor made a copy in /tmp/daemon.

A quick fix is to add a note to the README: don't upload directly to sync folders. A better approach might be to have a delay between change notification and kicking off the processor.

wimleers commented 13 years ago

Do you mean that the tmp directory does not get cleared, i.e. that temporary files remain there permanently? That should not happen.

tedfordgif commented 13 years ago

No, the problem is this:

  1. fileconveyor registers with inotify
  2. User initiates upload (of large file) via FTP
  3. FTP daemon writes first block(s) of file /some/user/file/to/be/synced.
  4. fileconveyor gets notified, copies the file to /tmp/daemon/ and names it (including the time of modification as specified by unique_filename.Mtime)
  5. FTP daemon receives more data, writes more blocks to /some/user/file/to/be/synced.
  6. Goto step 4 unless the file is finished uploading.

End result: fileconveyor has queued for sync many partial copies of the file.

wimleers commented 13 years ago

This is a clearly a critical problem. I was aware that this problem could pose itself, but clearly this is a case in which it's not just a problem but a very major problem.

Fortunately, this will be addressed by #68! :)