marcello3d / node-mongolian

[project inactive] Mongolian DeadBeef is an awesome Mongo DB driver for node.js
https://groups.google.com/group/node-mongolian
zlib License
350 stars 50 forks source link

ThreadSafe array #42

Closed ghost closed 12 years ago

ghost commented 13 years ago

I peeked at the code to see how you collect all the records before you return the array.

Is that thread safe, if called my many 'threads'?

Closure would be traditional.

(or fibers, actors, chain of responsibility).

What do you think?

marcello3d commented 13 years ago

I'm not 100% sure I'm following the question, so let me describe the current setup:

Each cursor object should only be used once to receive data, since the behavior would be unclear on whether it should cache or re-execute the request. This is particularly important when you're using the next() or nextBatch() commands, since the cursor will keep state of where in the data result it's pointing.

Currently if you want multiple callbacks to use the data from the same cursor, you would need to manage that yourself (call toArray once and hand it out to all the other callbacks). https://github.com/marcello3d/node-taxman is a library I wrote for that pattern, though I wouldn't necessarily recommend that specifically.

If you could give some code examples/test cases of what you're envisioning, perhaps we can come up with a better solution?

ghost commented 13 years ago

What I am thinking is to create a pool of collection(s) Then onStateChange of pool(ie release) see who else is waiting to use the collection. That would make it scale and handle a huge load.

To test it, create a pool of 2 collections, and write a long running mongo query and start node server. Then start many clients (ex: browsers) making requests of the node server. Node is single threaded but while 1st few request go ... it tries to handle the other incoming requests... but the pool of collections is exhausted. Otherwise, your toArray will mix up different requests, returning some 'rows' multiple times. What do you think?

I can try to work on that next weekend.

marcello3d commented 13 years ago

The current code should already work with a single connection.

Each request to the server is tagged with a requestId which is included in responses from the server. Mongolian uses this to avoid the problem you describe and allow for multiple simultaneous asynchronous requests.

For example, this will work with no problems (if it doesn't, there's a bug!):

var collection = new Mongolian().db('foo').collection('bar')
collection.find().toArray(function(error, results) { console.log("results1 =",results) })
collection.find().toArray(function(error, results) { console.log("results2 =",results) })

While the order of the the two results is not guaranteed, they will always be independent requests and the two results will always be identical.

This is the case I was describing earlier which will not work:

// this doesn't work, don't even think about copying it:
var cursor = new Mongolian().db('foo').collection('bar').find()
cursor.toArray(function(error, results) { console.log("results1 =",results) })
cursor.toArray(function(error, results) { console.log("results2 =",results) }) // bad! don't use a cursor twice, at the same time

That's not to say connection pooling won't improve performance (it might), but as far as correctness goes, there shouldn't be any concurrency problems.