vsivsi / meteor-file-collection

Extends Meteor Collections to handle file data using MongoDB gridFS.
http://atmospherejs.com/vsivsi/file-collection
Other
159 stars 37 forks source link

How to properly remove incomplete uploads #77

Closed JcBernack closed 8 years ago

JcBernack commented 8 years ago

If a large file upload is interrupted the chunks which were already uploaded remain in the database. This is necessary of course, otherwise resumable couldn't do what it's named after.

But what if a user actually does not want to resume upload? Does the partial upload remain in the database forever? Are there any downsides to regularly removing all incomplete files with collection.remove({ "metadata._Resumable": { $exists: true } })? Or is there a better way to handle this case?

vsivsi commented 8 years ago

I would add a clause to that query that only removes chunks that are older than X hours/days (or whatever policy makes sense for your service). You wouldn't want to whack chunks for files that are currently still being uploaded! You'll probably also want to delete the empty "placeholder" files that are waiting for all of the missing chunks to be uploaded. They don't take much space, but will also junk up your file collection over time. The worst case is if you remove chunks for a file that is still being uploaded, without some extra checking, the client will think it should be complete, but the server still won't be able to do it. So removing the empty placeholder file first, and then deleting all of the corresponding chunk temp files seems the safest, so that there's no chance that an active client won't know that it is out of sync with what is happening on the server.

I didn't implement any policies for this at all in the package because it is pretty easy to do yourself on the server-side and it seems like every app is going to have its own unique requirements that I can't predict.

JcBernack commented 8 years ago

Alright, thanks for the response. I wanted to make sure I didn't miss something, but I'm totally ok with the way it is.

I settled with cleaning up the database on server startup, to not interrupt running uploads.

Meteor.startup(function () {
  // remove placeholder files and chunks of incomplete files
  collection.remove({ $or: [{ "length": 0 }, { "metadata._Resumable": { $exists: true } }] });
}
vsivsi commented 8 years ago

That should work great, so long as you don't care if a long upload can successfully resume after being interrupted by a server restart.

janzheng commented 8 years ago

Just a note for future people stumbling across this issue. I ran into the problem of undeleted Resumable partials as well, and I ended up performing a "garbage collection" task on the complete event by doing what @JcBernack proposed earlier, with the addition of a Meteor userID check (to make sure you don't throw out someone else's partials!). I also had to wrap the function in _.debounce since the event seems to trigger multiple times before the upload is finished (or each individual file has been removed from the Resumable upload list). I also perform a garbage collection when the component is mounted, since I'm not handling accidental browser disconnects right now.

https://github.com/janzheng/Meteor-React-Alt-Base (please let me know if you have suggestions for handling this better)