Closed rocksfrow closed 9 years ago
@nicktacular can you please get back to me about this?
do you end up with records stuck in sessions_locks on your production system?
@nicktacular still waiting on a response to this... can you confirm whether or not you are ending up with entries in sessions_locks (old entries)? I want to confirm this isn't something happening only on my end. The problem with this is when this happens, the other end is getting locked out for good until they clear their cookies.
Hi @rocksfrow - no issues as you've described in the production environment. I'd like to investigate further. Can you send me additional information about your environment?
Are you in a replicated env? How many replicas? Do you seed your conn str with multiple endoints or just one?
@nicktacular I have one primary, one secondary, and one arbiter (one replica set). I am not seeding multiple connections.
I am going to add additional logging to my production system so I can investigate further and figure out exactly what's going on. I'll follow up with details.
@nicktacular actually, clarify what you meant by 'seeding multiple endpoints'... I don't seed multiple connections, but I do of course provide each member of my replica set in the connection string.
Hey @rocksfrow let me know if you have any prod logging to investigate further.
Hey man... any tips on what information would be useful to debug? The existing logging isn't sufficient...
I'm going to add more logging now that we have log priorities.
On Nov 4, 2014 9:35 PM, "Nick Ilyin" notifications@github.com wrote:
Hey @rocksfrow https://github.com/rocksfrow let me know if you have any prod logging to investigate further.
— Reply to this email directly or view it on GitHub https://github.com/nicktacular/php-mongo-session/issues/9#issuecomment-61751106 .
@nicktacular it looks like what I am experiencing are locks being stuck forever after a timeout or something... IE, if my script dies with a 500 error (php timeout), that session is now stuck forever.
Correct me if I am wrong, but anytime my script doesn't end execution gracefully after have already opened a session, subsequent requests with that key will hang.
Here is an example from my HTTP logs...
You can see the script dies from another cause (script timed out), which CAN/WILL happen. But the issue is this lock is now stuck forever, as you can see the following requests all die with a lock timeout.
[Thu Jan 01 15:03:22 2015] [warn] [client xxx] mod_fcgid: read data timeout in 31 seconds, referer: https://xxx
[Thu Jan 01 15:03:22 2015] [error] [client xxx] Premature end of script headers: index.php, referer: https://xxx
[Thu Jan 01 15:04:30 2015] [warn] [client xxx] mod_fcgid: stderr: PHP Notice: Could not acquire lock for jdauj4v9766meag73iifcj7c72 in /usr/share/php-mongo-session/MongoSession.php on line 483
[Thu Jan 01 15:05:38 2015] [warn] [client xxx] mod_fcgid: stderr: PHP Notice: Could not acquire lock for jdauj4v9766meag73iifcj7c72 in /usr/share/php-mongo-session/MongoSession.php on line 483, referer: https://xxx
...
You said you don't end up with any stuck locks though? Even in this scenario?
Let me know your thoughts... I am thinking some sort of garbage collection is needed to clean up these "dead locks".
PS, Happy New Year!
Yeah so I had a huge spike of dead locks today, and looking at my logs I can totally see everytime it's happening is when I get a read timeout.
So it looks like I have an external source causing my scripts to hang/timeout.
Once I hopefully figure this out it will be a non-issue... but this is definitely a flaw of the locking.
Won't this same behavior happen on ANY fatal error? So one script failing could/would cause that visitor to not be able to view the site any longer (until I clean up their stuck lock).
I am thinking maybe we should implement garbage clean up with a "max lock age". So at least after a couple minutes or something, these locks would be removed.
Currently you can't view the site any longer until you clear your cookies.
:tada: Happy new year! :tada:
I had a couple ideas some time ago which, isn't trivial to implement, but it might be worth looking into since the scenario you describe does mean that you have stuck locks.
Register shutdown function
I haven't tested this, but I've used the register shutdown function for logging implementations (specifically, profiling) and I'm wondering if it could be used in a similar manner to cleanup, also. I'd need to write a quick prototype to see if this is feasible and would work in all scenarios.
Process monitor
This would require a few components. First, MongoSession
would add it's process id and machine identifier (i.e. hostname) to the lock information.
The second component would be a cron task (or something similar) that would look into the lock documents, find any lock documents without an active process on this current machine, and delete that document. This would require 2 additions to the lock document:
The issue here is that the cron task either needs to run like a daemon to be as responsive to deadlocks as possible or if you do a per-minute lock check, then in some cases a user could be locked out for up to 60 seconds.
This solution is complex, but would probably be the most effective in clearing out session locks. There is a variation to this that could probably be implemented using advisory file locking but would still need a cron/daemon.
@nicktacular I have confirmed the stuck locks were indeed all coming from the timeouts.
I've made some server migrations recently where the sites using mongo sessions were no longer connecting to a remote database.
All of the requests have sped up drastically, and I actually haven't seen a stuck lock since I migrated over the DB server... (in 3-4 days now).
@nicktacular I am going to close this issue out for the sake of not ending up with a huge issue.
This is NOT a bug, but rather a feature request.
@nicktacular do you have the issue of old locks getting stuck in sessions_locks? I have logging enabled when one of these occurred and I don't see any exceptions or anything.
I am debugging further into the raw http logs, but I am wondering if you guys have this occur?
Maybe adding some cleanup to gc() for old records in sessions_locks?