Closed cout closed 1 year ago
Thanks @cout for the excellent writeup, apologies for not getting back to you sooner, I must have missed the email notification. I will try your steps to reproduce. I'm a little reluctant to blindly apply your patch just yet as I don't currently understand why this is still an issue. I'll be in-touch soon. As to your question
I am also questioning the call to uv_poll_stop in on_connect -- other than the first connection, is there ever a case where on_connect would be called without a call to on_disconnect?
This is only there because it was present in jeremy childs original node-ldap code. I don't recall any instance where on_connect was being called twice without disconnect being called in between, so you're probably correct you could probably just remove the else statement. I suspect at the time I likely was reluctant to change something I assumed worked.
We have been running with the above patch in our test environment for a few days now, and we are still seeing crashes shortly after a disconnect/reconnect. We have seen one crash in the garbage collector and one crash in uv_async_init
, which hints at possible memory corruption. So this patch still needs some work before it can be applied.
The following script reproduces the problem with only one socket:
const child_process = require('child_process');
const fs = require('fs');
const LDAP = require('napi-ldap'); // or require('./main')
const port = 12345;
const server_cfg = { port: port, userLoginAttribute: 'cn', searchBase: 'dc=test' };
const users = [ { 'dn': 'cn=user,dc=test', 'cn': 'user-login' } ];
function start_ldap_server() {
return new Promise((resolve, reject) => {
fs.writeFileSync('conf.json', JSON.stringify(server_cfg));
fs.writeFileSync('users.json', JSON.stringify(users));
const server_args = [ 'node_modules/ldap-server-mock/server.js', '--conf=conf.json', '--database=users.json', ];
const server = child_process.spawn('node', server_args);
var resolved = false;
server.stdout.on('data', (data) => { console.log(`ldap server stdout: ${data}`); resolve(server); });
server.stderr.on('data', (data) => { console.log(`ldap server stderr: ${data}`); });
server.on('exit', function(code) { if (!resolved) { reject(new Error(`ldap server exited with code ${code}`)); }
});
});
}
function stop_ldap_server(server, client) {
return new Promise((resolve, reject) => {
client.options.disconnect = function() { resolve(); };
server.kill();
});
}
function bind(ldap, options) {
return new Promise((resolve, reject) => {
ldap.bind(options, (err) => {
if (err) { reject(new Error(err)); } else { resolve(); }
});
});
}
async function main() {
let server = await start_ldap_server();
const uri = `ldap://127.0.0.1:${port}`;
const ldap = new LDAP({ uri: uri, base: 'dc=test' });
await bind(ldap, { binddn: 'cn=user,dc=test', password: 't3st' });
await stop_ldap_server(server, ldap);
server = await start_ldap_server();
await bind(ldap, { binddn: 'cn=user,dc=test', password: 't3st' });
await stop_ldap_server(server, ldap);
}
main().catch((err) => {
console.error(err);
});
Using node v12.9.1:
$ ./test.js
ldap server stdout: LDAP server listening on port 12345
ldap server stdout: LDAP server listening on port 12345
Error: LDAPError: Timeout
at /home/cout/git/napi-ldap/test.js:39:25
at LDAP.searchTimeout (/home/cout/git/napi-ldap/main.js:342:7)
at listOnTimeout (internal/timers.js:531:17)
at processTimers (internal/timers.js:475:7)
node: ../deps/uv/src/unix/core.c:929: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.
Using node v14.17.0 I see a timeout, but no assertion:
ldap server stdout: LDAP server listening on port 12345
ldap server stdout: LDAP server listening on port 12345
Error: LDAPError: Timeout
at /home/cout/git/napi-ldap/test.js:37:25
at LDAP.searchTimeout (/home/cout/git/napi-ldap/main.js:342:7)
at listOnTimeout (internal/timers.js:555:17)
at processTimers (internal/timers.js:498:7)
Adding some logging to cnx.c
I see that I am indeed getting a different file descriptor on the second connection attempt:
ldap server stdout: LDAP server listening on port 12345
on_connect; fd=22
on_connect calling uv_poll_init
on_connect calling uv_poll_start
on_disconnect
on_disconnect calling uv_poll_stop
ldap server stdout: LDAP server listening on port 12345
on_connect; fd=19
on_connect calling uv_poll_stop
on_connect calling uv_poll_start
Error: LDAPError: Timeout
at /home/cout/git/napi-ldap/test.js:37:25
at LDAP.searchTimeout (/home/cout/git/napi-ldap/main.js:342:7)
at listOnTimeout (internal/timers.js:555:17)
at processTimers (internal/timers.js:498:7)
Reading https://stackoverflow.com/a/47270400/244083 it sounds like the correct thing to do is to call free
from the callback passed to uv_close
.
I applied this patch:
diff --git a/cnx.c b/cnx.c
index e513487..60da1c8 100644
--- a/cnx.c
+++ b/cnx.c
@@ -1231,6 +1231,11 @@ on_connect (LDAP * ld, Sockbuf * sb,
}
static void
+on_close (uv_handle_t * handle) {
+ free(handle);
+}
+
+static void
on_disconnect (LDAP * ld, Sockbuf * sb, struct ldap_conncb *ctx)
{
struct ldap_cnx *ldap_cnx = (struct ldap_cnx *) ctx->lc_arg;
@@ -1247,7 +1252,11 @@ on_disconnect (LDAP * ld, Sockbuf * sb, struct ldap_conncb *ctx)
return;
if (ldap_cnx->handle)
- uv_poll_stop (ldap_cnx->handle);
+ {
+ uv_poll_stop (ldap_cnx->handle);
+ uv_close (ldap_cnx->handle, on_close);
+ ldap_cnx->handle = NULL;
+ }
status = napi_open_handle_scope (env, &scope);
assert (status == napi_ok);
and the above test exits with no assertion and no timeout (tested both node v12.9.1 and v14.17.0):
ldap server stdout: LDAP server listening on port 12345
ldap server stdout: LDAP server listening on port 12345
Hi, We face similar issues after disconnecting from the server. Do you have any update (or ETA) on the fix or any workaround we could apply meanwhile?
As noted in 457c44cf, a crash can result if
uv_poll_stop
is called more than once on the same handle. That commit added a workaround for the problem, but I do not believe it is sufficient. We tested napi-ldap 1.0.2 (which afaict should have the workaround) with nodejs 14.17.0.Steps to reproduce:
bind
, which results in the socket getting reconnected.As my colleague @pfree pointed out, setting the handle to
NULL
avoids the crash in node-LDAP, though we realize now the original patch leaked the handle. The following patch (which is not extensively tested) does avoid the crash in napi-ldap:I do not know why checking the result of
uv_loop_alive
was not enough in itself to prevent the crash; it seems like it would be. In any event, sincehandle
is notNULL
whenon_connect
is called,uv_poll_init
is never called to bind the handle to the new file descriptor. In the instances where the above test did not cause node to crash, I believe it explains why napi-ldap stopped handling incoming messages.One potential issue with the above patch is I suspect setting the handle to
NULL
would interfere with the previous check that checks for disconnect getting called twice (I do not observe that behavior; in my test,on_disconnect
gets called exactly once).Perhaps disconnect is getting called twice due to this same bug, if libuv is being put into an inconsistent state? If so that check may be unnecessary.
I am also questioning the call to
uv_poll_stop
inon_connect
-- other than the first connection, is there ever a case whereon_connect
would be called without a call toon_disconnect
?