nicktacular / php-mongo-session

A PHP session handler with a Mongo DB backend.
MIT License
18 stars 6 forks source link

Lock race #1

Closed anasser closed 9 years ago

anasser commented 10 years ago

I believe the lock function has a racing condition. Two different threads might check the lock of the same session id at the same time and find that it is not locked. Then, they will race to lock the session. The one that locks first will work. The other one will get a duplicate id exception and fail. I believe it should wait for the lock to be released instead of completely failing.

nicktacular commented 10 years ago

Hi @anasser, thanks for posting.

I'm not sure I fully understand which line(s) of code you're referring to. Are you talking about line 281 or 291 or something else?

anasser commented 10 years ago

Well, yes, the racing might happen between the lines 281 and 291.

Assume Thread1 and Thread2 executed the line 281 at the same time and found that the session is not locked. Then both proceeded to lock the session at line 291. One of them will surely win the race and lock the session successfully.

The other thread will try to lock the session at line 291 but will fail as the database will already have a lock with the same _id which must be unique. What do you think?

Thread1..............Thread2 |.............................| |.............................| line 281...................line 281..................//Both threads found the session id is not locked |.............................| |.............................| line 291..................|............................//Thread 1 was able to insert the lock into the lock collection |.............................| |.............................line 291..................//Thread 2 will fail because it is trying to insert a duplicate id

I hope this text diagram renders well on your side.

nicktacular commented 10 years ago

Yes, thanks that's helpful.

I absolutely agree, some sort of MongoException will be thrown which isn't caught there. I'll add a try block and continue upon a thrown exception such that it'll re-enter the loop as if it never acquired the lock in the first place.

Thanks for noticing!

anasser commented 10 years ago

Nice that was fast! :) . Actually, something just occurred to me. I'm sorry for leading you in a bit wrong direction. Since you use save(), the second write will not fail. save() will just update the existing lock and the Thread2 will continue thinking it acquired the lock. But this means that both threads, will think they locked the session which is wrong. So I think save() needs to be insert() so that it throws the duplicate key exception.

nicktacular commented 10 years ago

Hm you're right, save will never throw a duplicate key exception since it's thinking it's an upsert instead. I'll modify to use insert instead. But now that I'm on that I need to also modify the saving parameters of which safe is deprecated so I'll spend some time working on that as well. This update may take some time as I'd like to get unit tests finished for this also.

rocksfrow commented 10 years ago

@nicktacular

Hi, I see this issue is closed but I don't see a commit with any further changes from your last comment.

Is this race condition still a concern @anasser?

nicktacular commented 10 years ago

This condition hasn't yet been resolved. Unfortunately I've been strayed away on other projects but if there's concerns from new/existing users seeing this in their environments, I will escalate this issue.

From my experience, this has ben running in a 5-server load balanced environment since November 2012 and haven't had a single incident. This is to say that it's extremely unlikely that this would occur.

Having said that, it's on my list of things to resolve.

Let me know if you have any additional questions.

rocksfrow commented 10 years ago

Hey Nick, thanks for the quick reply. What is the worst-case-scenario from this potential issue, were it to occur? What type of incident would you expect to see had it happened?

On Thu, Jul 24, 2014 at 4:56 PM, Nick Ilyin notifications@github.com wrote:

This condition hasn't yet been resolved. Unfortunately I've been strayed away on other projects but if there's concerns from new/existing users seeing this in their environments, I will escalate this issue.

From my experience, this has ben running in a 5-server load balanced environment since November 2012 and haven't had a single incident. This is to say that it's extremely unlikely that this would occur.

Having said that, it's on my list of things to resolve.

Let me know if you have any additional questions.

— Reply to this email directly or view it on GitHub https://github.com/nicktacular/php-mongo-session/issues/1#issuecomment-50076912 .

nicktacular commented 10 years ago

Worst case is that one of the threads save operation gets overwritten by the next save operation if there are 2 (from the same session).

rocksfrow commented 10 years ago

@nicktacular I have changes pushed up to my fork that adds MongoClient() support as well as a few other changes like passing down connection options.

I've been stress testing a two node cluster with success. (3rd node is arbiter only).

I would love for you to test my version on your platform if you're running older versions that don't support MongoClient. On Jul 29, 2014 8:31 AM, "Nick Ilyin" notifications@github.com wrote:

Worst case is that one of the threads save operation gets overwritten by the next save operation if there are 2 (from the same session).

— Reply to this email directly or view it on GitHub https://github.com/nicktacular/php-mongo-session/issues/1#issuecomment-50469530 .

rocksfrow commented 10 years ago

@nicktacular the 'safe' replacements I've made in my fork actually already takes care of the updated options for ->insert(). I am going to replace the lock ->save with ->insert, and add proper handling. I'll push that up soon.

rocksfrow commented 10 years ago

@anasser @nicktacular what do you think of these changes?

It seems to be working good... the class now properly drops out on all exceptions on the lock other than duplicate key. It's using ->insert instead of ->save. I confirmed the duplicate key constraint triggers...

https://github.com/rocksfrow/php-mongo-session/commit/083040fb2da55f4aa6dd5c115eda9bc90c4d696d

Curious of your thoughts. @nicktacular this should resolve all open issues.

rocksfrow commented 10 years ago

So for example, now when I set my write_concern higher than the number of nodes I have, instead of the lock retrying for 30s and failing (and ignoring the write concern exception), it now drops out immediately after wTimeoutMS, which I am overriding to 5s. The only flaw is that the PANIC error message is using getConfig('locktimeout'), so the error isn't totally accurate.

Example log output in this scenario:

Trying to acquire a lock on jng5ps6rnu2dfkehonfaj5upr3
10.103.30.34:27017: waiting for replication timed out
PANIC! jng5ps6rnu2dfkehonfaj5upr3 cannot be acquired after waiting for 30s. 
rocksfrow commented 10 years ago

@nicktacular I think you can close this issue after merging my fork changes.

anasser commented 10 years ago

@rocksfrow your fork seems to resolve the race condition discussed in this issue. Thanks.

rocksfrow commented 9 years ago

@nicktacular you might want to close this out... as you can see, the bug fix was confirmed all the way back to Aug 1, you just now merged it :)

nicktacular commented 9 years ago

Great, thanks very much for @rocksfrow for addressing and thanks to @anasser for opening up this issue.