Closed neonstalwart closed 11 years ago
the problem is also compounded by the call to binding.simpleBind
blocking until it fails. maybe https://github.com/joewalnes/node-ldapauth/blob/master/ldapauth.cc can shed some light on how to do this without blocking?
ldapauth.cc handles this using a second thread, which is do-able, for sure.
Yeah, um, well the backoffmax * 1000 is a pretty massive bug. Easily fixed, thanks.
There likely is another bug regarding reconnect and close() - can you explain what you're seeing a bit better?
I'm very interested in fixing the problems you are seeing, but I'd like to point out that you can findandbind on an already opened (and bound) connection - connect anonymously on startup, and then just call findandbind as your requests come in - it's way more efficient than opening a new TCP session with each request.
ldapauth.cc handles this using a second thread, which is do-able, for sure.
that would be nice. currently, since the call to binding.simpleBind
is sync, i'm being blocked waiting for the underlying connection attempt to timeout/fail. this seems like about 30 seconds (maybe more) and i'd rather not have to wait so long to find out that the connection is going to fail. if the bind was done in a separate thread, i would think that the JS could have it's own timeout to invoke the callback with an error sooner than when the connection attempt failed.
Yeah, um, well the backoffmax * 1000 is a pretty massive bug. Easily fixed, thanks.
thanks - i see the fix and i'll try it out.
There likely is another bug regarding reconnect and close() - can you explain what you're seeing a bit better?
i'll try to explain what i'm seeing. following the code in LDAP.js for when the connection is down here's what i think i'm seeing...
ldap.open
, it will call binding.open
(this seems to have no problems) and then it tries to do an anonymous bind
.bind
the following line is what blocks for ~30 seconds msgid = binding.simpleBind();
bind
code continues) is that the disconnected callback (added via the call to binding.setcallbacks
) is invoked (this is as a result of EMITDISCONNECT
in the C++ code) and we are not currently reconnecting
, so there is a call to reconnect
.reconnect
closes the binding and then adds a call to setTimeout
that involves the backoff
logic (this first timeout is for 0
ms using the default values).reconnect
finishes, then the binding.simpleBind
call mentioned above is now complete and msgid
is -1
.bind
function then continues to update the stats and registers a callback via setCallback(msgid, bind, arguments, fn)
(keeping in mind that msgid
is -1).setCallback
then sees that the msgid
is not >=1
and then invokes my callback with an error.setTimeout
(added during reconnect
) is invokedvar res = open(function(err) {
if (err) {
reconnect();
} else {
stats.reconnects++;
opts.backoff = -1;
replayCallbacks();
reconnecting = false;
if (syncopts) {
sync(syncopts);
}
}
});
open
again which results in blocking the thread again for ~30 seconds.open
eventually finds that there is no connection to the server and so the callback is called with an error.reconnect
is called again and now we are in a loop... reconnect
-> open
-> fail -> reconnect
I'm very interested in fixing the problems you are seeing, but I'd like to point out that you can findandbind on an already opened (and bound) connection - connect anonymously on startup, and then just call findandbind as your requests come in - it's way more efficient than opening a new TCP session with each request.
thank you... i may have a misunderstanding - is ldap.open
a sync call? if not, how do you handle knowing when the connection is ready as requests come in? i can see that i could use process.on('exit', ... )
to close the connection without needing to know when it was ready but i'm not sure how i can handle knowing when the connection is ready to use without using the callback? is it safe to call ldap.open
multiple times without calling ldap.close
?
Excellent analysis! I've been looking for that bug for a really long time (have not been able to reproduce in captivity), but that sounds exactly what I've been looking for. I'll look at it forst thing tomorrow and let you know what I find.
I'll also write tomorrow regarding the proper use of open() et al.
thank you. I also just realized (as I was falling asleep), the calls to reconnect will grow exponentially because during the course of calling open (from inside reconnect), the disconnect callback will call reconnect and so will the callback provided to open.
I also just realized (as I was falling asleep), the calls to reconnect will grow exponentially...
the reconnecting
flag actually guards against this happening. the disconnect callback won't call reconnect
if it is already reconnecting
...
I found an async connect option that I totally missed somehow - I'm working on getting open() to be fully async, which should solve a ton of problems. Give me a few hours.
woo!
Haven't forgotten about this - just got really busy this week. Stay tuned.
@jeremycx any progress on this? is there anything i could do to help?
i understand that open source projects are usually developed in people's spare time so i hate to apply pressure but i need to decide if i should be moving to another library in order to keep my project moving forward.
hey @jeremycx is there any chance you might look at this again? i was hoping you might get around to taking a look at this and i'm actually using your library (and just hoping that the connection to the ldap server doesn't fail and cause my server to block waiting for a timeout). it sounded like you had found something that would fix this.
Sorry for not keeping you up to date.
I did look at it - for 3-4 evenings. What a mess (on the libldap side). Callbacks that never get called back. Weirdness abundant!
I'm currently looking at the reconnect logic again, which is, unsuprisingly, horrible broken if you have a syncrepl cannel activated. Sigh.
So, unless I get really weird and connect in a different thread (which has it's own share of considerations), I think we may have to live with block-on-connect for now. How long a block could you actually handle without too much trouble? I think I could tighten up the connect timeout to a small number of seconds which should keep you out of trouble.
thanks for the response - i think currently it's blocking for somewhere in the order of 30 seconds before it has a timeout. i'm using this as part of the login process for users of a web application. so if the LDAP server ever happened to be down, users will be waiting about 30 seconds before the webserver provides a response to the browser. given that in my case the LDAP server is currently on the same infrastructure as the webserver i figure it's not too big of a problem - if one's down they're probably both down. however, if for some reason only the ldap server was down (or when we move to making them more independent of each other) then i think somewhere in the range of 2 - 5 seconds might be a better experience for the user.
The connect timeout has been added 1.1.2.
Also, lots of reconnect fixes are complete.
i'd like to have a way to limit the backoff logic.
by default,
opts.backoff
is-1
. whenbackoff
is called, it will return a number for thesetTimeout
for reconnecting - first time it will return0
, next time1000
then2000
untilbackoff
reachesbackoffmax
(which by default is30000
) and at that point, the value returned tosetTimeout
will be30000000
and then it will return that number until a connection is successful.my use case is simply to use
findandbind
to authenticate users. iopen
a connection, then in the callback i callfindandbind
and then in that callback iclose
the connection. if the LDAP server is not available toopen
, my callback is called with an error and i close the connection - at that point i don't want to keep trying reconnects.using node-inspector (or by adding some debugging to node-LDAP) i can see that the backoff logic still applies even after i close the connection. is there a way to stop it? am i doing something wrong? i've tried to limit the backoff by using small numbers but it still keeps going.
i see this when developing locally because i don't have access to the LDAP server. you could probably see this issue by trying to open a nonsense url. it doesn't look like you have any tests that cover this kind of reconnecting.