abandoned session locks

rocksfrow commented 10 years ago

@nicktacular do you have the issue of old locks getting stuck in sessions_locks? I have logging enabled when one of these occurred and I don't see any exceptions or anything.

I am debugging further into the raw http logs, but I am wondering if you guys have this occur?

Maybe adding some cleanup to gc() for old records in sessions_locks?

rocksfrow commented 10 years ago

@nicktacular can you please get back to me about this?

do you end up with records stuck in sessions_locks on your production system?

rocksfrow commented 10 years ago

@nicktacular still waiting on a response to this... can you confirm whether or not you are ending up with entries in sessions_locks (old entries)? I want to confirm this isn't something happening only on my end. The problem with this is when this happens, the other end is getting locked out for good until they clear their cookies.

nicktacular commented 10 years ago

Hi @rocksfrow - no issues as you've described in the production environment. I'd like to investigate further. Can you send me additional information about your environment?

Are you in a replicated env? How many replicas? Do you seed your conn str with multiple endoints or just one?

rocksfrow commented 10 years ago

@nicktacular I have one primary, one secondary, and one arbiter (one replica set). I am not seeding multiple connections.

I am going to add additional logging to my production system so I can investigate further and figure out exactly what's going on. I'll follow up with details.

rocksfrow commented 10 years ago

@nicktacular actually, clarify what you meant by 'seeding multiple endpoints'... I don't seed multiple connections, but I do of course provide each member of my replica set in the connection string.

nicktacular commented 9 years ago

Hey @rocksfrow let me know if you have any prod logging to investigate further.

rocksfrow commented 9 years ago

Hey man... any tips on what information would be useful to debug? The existing logging isn't sufficient...

I'm going to add more logging now that we have log priorities.

On Nov 4, 2014 9:35 PM, "Nick Ilyin" notifications@github.com wrote:

Hey @rocksfrow https://github.com/rocksfrow let me know if you have any prod logging to investigate further.

— Reply to this email directly or view it on GitHub https://github.com/nicktacular/php-mongo-session/issues/9#issuecomment-61751106 .

rocksfrow commented 9 years ago

@nicktacular it looks like what I am experiencing are locks being stuck forever after a timeout or something... IE, if my script dies with a 500 error (php timeout), that session is now stuck forever.

Correct me if I am wrong, but anytime my script doesn't end execution gracefully after have already opened a session, subsequent requests with that key will hang.

Here is an example from my HTTP logs...

You can see the script dies from another cause (script timed out), which CAN/WILL happen. But the issue is this lock is now stuck forever, as you can see the following requests all die with a lock timeout.

[Thu Jan 01 15:03:22 2015] [warn] [client xxx] mod_fcgid: read data timeout in 31 seconds, referer: https://xxx
[Thu Jan 01 15:03:22 2015] [error] [client xxx] Premature end of script headers: index.php, referer: https://xxx
[Thu Jan 01 15:04:30 2015] [warn] [client xxx] mod_fcgid: stderr: PHP Notice:  Could not acquire lock for jdauj4v9766meag73iifcj7c72 in /usr/share/php-mongo-session/MongoSession.php on line 483
[Thu Jan 01 15:05:38 2015] [warn] [client xxx] mod_fcgid: stderr: PHP Notice:  Could not acquire lock for jdauj4v9766meag73iifcj7c72 in /usr/share/php-mongo-session/MongoSession.php on line 483, referer: https://xxx
...

You said you don't end up with any stuck locks though? Even in this scenario?

Let me know your thoughts... I am thinking some sort of garbage collection is needed to clean up these "dead locks".

rocksfrow commented 9 years ago

PS, Happy New Year!

rocksfrow commented 9 years ago

Yeah so I had a huge spike of dead locks today, and looking at my logs I can totally see everytime it's happening is when I get a read timeout.

So it looks like I have an external source causing my scripts to hang/timeout.

Once I hopefully figure this out it will be a non-issue... but this is definitely a flaw of the locking.

Won't this same behavior happen on ANY fatal error? So one script failing could/would cause that visitor to not be able to view the site any longer (until I clean up their stuck lock).

I am thinking maybe we should implement garbage clean up with a "max lock age". So at least after a couple minutes or something, these locks would be removed.

Currently you can't view the site any longer until you clear your cookies.

nicktacular commented 9 years ago

:tada: Happy new year! :tada:

I had a couple ideas some time ago which, isn't trivial to implement, but it might be worth looking into since the scenario you describe does mean that you have stuck locks.

Register shutdown function

I haven't tested this, but I've used the register shutdown function for logging implementations (specifically, profiling) and I'm wondering if it could be used in a similar manner to cleanup, also. I'd need to write a quick prototype to see if this is feasible and would work in all scenarios.

Process monitor

This would require a few components. First, MongoSession would add it's process id and machine identifier (i.e. hostname) to the lock information.

The second component would be a cron task (or something similar) that would look into the lock documents, find any lock documents without an active process on this current machine, and delete that document. This would require 2 additions to the lock document:

Originating process id
Originating machine identifier (so that one machine can be distinguished from another)

The issue here is that the cron task either needs to run like a daemon to be as responsive to deadlocks as possible or if you do a per-minute lock check, then in some cases a user could be locked out for up to 60 seconds.

This solution is complex, but would probably be the most effective in clearing out session locks. There is a variation to this that could probably be implemented using advisory file locking but would still need a cron/daemon.

rocksfrow commented 9 years ago

@nicktacular I have confirmed the stuck locks were indeed all coming from the timeouts.

I've made some server migrations recently where the sites using mongo sessions were no longer connecting to a remote database.

All of the requests have sped up drastically, and I actually haven't seen a stuck lock since I migrated over the DB server... (in 3-4 days now).

@nicktacular I am going to close this issue out for the sake of not ending up with a huge issue.

This is NOT a bug, but rather a feature request.

nicktacular / php-mongo-session

abandoned session locks #9