vsivsi / meteor-file-collection

Extends Meteor Collections to handle file data using MongoDB gridFS.
http://atmospherejs.com/vsivsi/file-collection
Other
159 stars 37 forks source link

remove() returns wrong results and doesn't respect collection Lock settings #59

Closed timothyarmes closed 9 years ago

timothyarmes commented 9 years ago

Hi,

The remove function appears to be broken. For me it would always return 'null', indicating a time out.

After delving in the depths of the code, down to the locks levels, my understanding of the problem is that the timeout value is always 0, thus resulting in the time-out event being fired systematically.

It seems to me that FileCollection's 'remove' function should be passing the lockOptions though to GridFs's remove function (as you do when creating read and write streams). When I do this in a local version of my code the remove function then works as expected.

vsivsi commented 9 years ago

Hi, thanks for reporting this. I've never encountered any issues with remove() and in nearly a year of this package being available, you are the first to report any difficulties. I'm not saying that the issue you are seeing isn't real, but it seems there must be something special about your setup that is leading to this behavior.

Some basic questions to begin: What platform are you on? What versions of Meteor, MongoDB and file-collection? Does file-collection pass its unit-tests* on your platform? Does the file-collection sample app work correctly when hosted on your platform? Does the issue occur when "removing" a file in server code, or only when done on the client?

Finally, can you please provide a simple app that demonstrates this issue so I can reproduce it?

As I said above, this is such a fundamental bit of functionality to the package, it's difficult to imagine it being so fundamentally broken. So my best guess is that this is a configuration or permission issue or some kind, or (less likely) a corner case specific to your setup, but it'll be hard to get to the bottom of it until I can reproduce the problem myself, so any and all addition info you can provide to help with that will be much appreciated and will speed up my ability to help you resolve this.

*To run unit tests:

git clone --recursive https://github.com/vsivsi/meteor-file-collection FileCollection
cd FileCollection
meteor test-packages ./

Then open browser to localhost:3000

timothyarmes commented 9 years ago

Hi,

I'm running the latest release of Meteor, on a Mac. I'm removing the file in server code.

Having debugged the code using node-inspector I have information that could be useful. In function timeoutReadLockQuery of the gridfs-locks project, when calling 'self.collection.findAndModify', I consistently fall to the '!doc' case (line 477):

if (!doc) {
   if (new Date().getTime() - self.timeCreated >= self.timeOut) {
      return self.emit('timed-out');
   }
   return setTimeout(timeoutReadLockQuery, self.pollingInterval, self, options);
  } else {
  ...
  }

If self.timeOut is zero (the default) then the timed-out event is bubbled back up to file-collection (via gridfs-locking-streams), and the remove fails. Here's the relevant code in griefs-locking-streams:

function lockAndRemove() {
  var lock = Lock(options._id, self._locks, options);
  lock.obtainWriteLock().once('locked',
    ...
  }).once('timed-out', function () {
    callback(null, null);  <-- Remove fails, and we just return null here
  }
  ).once('error', function (err) {
    ....
  }
  );
}

Now, as for how I get into this state in the first place (where 'doc' is null) I'm not sure, but I think it's caused when the app crashes before I've finished closing the write stream, so the lock's left open. In that case it's then impossible to delete the file.

It seems sure so me that you should be passing file-collection's lock parameters to the remove call, no? This fixes the issue for me.

vsivsi commented 9 years ago

Okay, I can reproduce this. Suprising this wasn't caught before now. I'm on it.

vsivsi commented 9 years ago

Hi, okay, remove was all kinds of screwed-up. It worked, mostly, but I've found and fixed three issues:

  1. As you noted, correct Lock settings were not being passed into the underlying remove function. This could cause removes to prematurely fail for files with outstanding read/write locks.
  2. The return values being generated by remove were incorrect. Per the Meteor docs on remove, (which this package should mimic), remove should return the number of files actually removed (except when invoked on the client without a callback).
  3. There was no effective unit-test coverage for remove which is why this slipped by for so long.

I've committed a set of changes on the remove_issue branch that should fix all of these problems. The final remaining issue is that the client side "FileCollection insert, findOne and remove with callback" test is failing because I'm still trying to figure out the proper return value for that case. On the server the correct value of "1" is being returned, but on the client, a "0" is being returned. I can't figure out why the server-side result isn't properly propagating back to the client call's callback... Anyway, I wanted to give you an early peek into the fixes so you can try them out and give feedback if you have the time.

Thanks again for reporting.

timothyarmes commented 9 years ago

Great news. It's nearly mid-night so I'll check this out on Monday now.

Thanks for your incredible reactivity!

vsivsi commented 9 years ago

I just committed changes to the remove_issue branch that resolve the final failing test. The committed solution is pretty hacktastic, and so I'm not terribly satisfied with it, but I need to think about how to better approach the underlying issue (which at least I now understand!) Have a good weekend, I'll probably have a new version posted to Atmosphere by Monday.

vsivsi commented 9 years ago

Okay, I'm happy with the new solution to the remote remove issue I encountered.

The issue was that a successful client remove would always return 0 because the underlying Mongo remote method code was attempting a second remove after file-collection had already completed the gridFS remove for the file. And that second failing remove was setting the return value back to the client.

The hacktastic workaround I did on Friday was to reinsert a dummy document with the same _id to give the default method something to do. Not very satisfying. The new method works properly by completely overriding the default remote method code for remove, so the issue doesn't occur in the first place. See here if you are interested in the details: https://github.com/vsivsi/meteor-file-collection/commit/2c7aefc32283605883f90c04f0017b80f06bf379

This was a bigger change than I had anticipated. Everything seems to be working fine, but I'm going to hold-off publishing this to Atmosphere until I hear from you that the remove_issue branch is working well and resolves all of your issues. Or at a minimum until I can use it a bit more myself to ensure that there's no subtle side-effects or corner cases that I haven't anticipated.

timothyarmes commented 9 years ago

Hi,

I've tested the branch and it appears to be working fine for me. I've also tested by reproducing the inability to remove files by causing a server crash while the lock is open, and it's now working (although we obviously have to wait for the timeout first).

Thanks for all your help.

Tim

vsivsi commented 9 years ago

Okay, thanks for the feedback on the changes. I just published v.1.1.5 to Atmosphere, so that should resolve this issue!