Closed michal42 closed 6 years ago
There's a fairly big backtrace in the valgrind test, as if things were not cleaned up properly at exit, could you please have a look?
Yes, I will have a look. It seems to be a race that I'm unable to trigger locally.
If it happened before then it doesn't need to block this PR, just if it's new
It didn't happen before I rebased with the zproject update. But since I'm removing a hack that was supposed to paper over race conditions, the bug is quite possibly linked to my change.
My guess is that the leak is caused by mlm_msg object pointers in flight either from the server engine to the stream engine or back, when the server is shut down. But I still haven't reproduced.
I finally had time to look into it. I can reliably trigger the leak with this test:
diff --git b/src/mlm_server.c a/src/mlm_server.c
index 7a5f323a3045..22273e58f542 100644
--- b/src/mlm_server.c
+++ a/src/mlm_server.c
@@ -1212,6 +1212,34 @@ mlm_server_test (bool verbose)
zactor_destroy (&server);
}
+ {
+ const char *endpoint = "inproc://mlm_server_memleak_with_pending_stream_traffic";
+ zactor_t *server = zactor_new (mlm_server, "mlm_server_memleak_with_pending_stream_traffic");
+ if (verbose) {
+ zstr_send (server, "VERBOSE");
+ printf("Regression test for a memleak with pending stream traffic after server destroy\n");
+ }
+ zstr_sendx (server, "BIND", endpoint, NULL);
+
+ mlm_client_t *producer = mlm_client_new ();
+ assert (mlm_client_connect (producer, endpoint, 1000, "producer") >= 0);
+ assert (mlm_client_set_producer (producer, "STREAM_TEST") >= 0);
+
+ zstr_sendx (server, "SLOW_TEST_MODE", NULL);
+
+ mlm_client_t *consumer = mlm_client_new ();
+ assert (mlm_client_connect (consumer, endpoint, 1000, "consumer") >= 0);
+ assert (mlm_client_set_consumer (consumer, "STREAM_TEST", ".*") >= 0);
+
+ zmsg_t *msg = zmsg_new ();
+ zmsg_addstr (msg, "world");
+ assert (mlm_client_send (producer, "hello", &msg) >= 0);
+
+ mlm_client_destroy (&consumer);
+ mlm_client_destroy (&producer);
+ zactor_destroy (&server);
+ }
+
// @end
printf ("OK\n");
}
And can trigger it both with and without this PR. The fix for the leak would be to synchronize with the stream engine instances during server shutdown and free any messages not yet dispatched. I can look into that, but can you please trigger a new travis run for this PR and if it is green, merge it?
Hrm, it's failing in zmq shutdown this time? I'm lost. Does master pass travis?
It does not - so I'll merge this
You can ignore the shutdown error - I'll have a look at it, probably a regression in libzmq
Thank you! I will look into the race condition during stream shutdown when I have time again.
@michal42 the regression has now been fixed in CZMQ
We have proper memory management since commit ee93e46f4bed ("Properly handle pending stream traffic after client disconnect"), so there is no need for the settle period.