Possible memoryleak in ResponseSocket

mvburgh commented 5 years ago

my code is already minimalistic; only a RequestSocket and ResponseSocket. I'm creating a RequestSocket per few seconds based on the requests from a Web Controller

Originally posted by @mvburgh in https://github.com/zeromq/netmq/issues/737#issuecomment-471332420

KamranShahid commented 5 years ago

Any one working on it?

mvburgh commented 5 years ago

Not me personally. I had a quick glance at the code but could not see any possible cause in the meanwhile.

Svisstack commented 5 years ago

@mvburgh Did you made any progress since then on that?

somdoron commented 5 years ago

I will take a look next week, on vacation this week.

Code that reproduce this will help

somdoron commented 5 years ago

Also @svisstack, can you check with socket.Options.Linger set to zero? I suspect it might be the issue.

Svisstack commented 5 years ago

Code which causing this is very simple just the PublisherSocker who had connected ~10 subscribers:

I'm not sure it's the same bug as the initial bug was related to the ResponseSocket.

On the Subscriber side, this bug does not exist.

From the memory dumps, we can see that there are probably too many Pub+PubSession objects and along with that Pipe, YPipe, YQueue, but all the memory is allocated on the YQueue+Chunk

Svisstack commented 5 years ago

@somdoron I confirm that Linger is equal to the {00:00:00} at the end of the Start() function in the snippet provided above @(Start(): return port;)

publisher.Options {NetMQ.SocketOptions} Affinity: 0 Backlog: 100 DelayAttachOnConnect: false DisableTimeWait: false Endian: Big IPv4Only: true Identity: null LastEndpoint: "tcp://0.0.0.0:61584" LastPeerRoutingId: null Linger: {00:00:00} MaxMsgSize: -1 MulticastHops: 1 MulticastRate: 100 MulticastRecoveryInterval: {00:00:10} PgmMaxTransportServiceDataUnitLength: 'publisher.Options.PgmMaxTransportServiceDataUnitLength' threw an exception of type 'NetMQ.InvalidException' ReceiveBuffer: 0 ReceiveHighWatermark: 1000 ReceiveLowWatermark: 0 ReceiveMore: false ReconnectInterval: {00:00:00.1000000} ReconnectIntervalMax: {00:00:00} SendBuffer: 0 SendHighWatermark: 0 SendLowWatermark: 0 TcpKeepalive: false TcpKeepaliveIdle: {-00:00:00.0010000} TcpKeepaliveInterval: {-00:00:00.0010000}

somdoron commented 5 years ago

Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve

somdoron commented 5 years ago

Thanks, does the subscribers come and go? Can you check who is referencing the PubSession?

Svisstack commented 5 years ago

@somdoron Take a look at the incoming reference chart.

In my use-case, the subscribers should not come and go frequently, but there could be a bug on my side causing the come and go and I analyzing that at the moment.

Svisstack commented 5 years ago

I'm using the 4.0.0.239-pre version.

somdoron commented 5 years ago

Can you send me the report? Which application are you using?

somdoron AT gmail DOT com

I'm not in front of a computer this week, but I will take a look beginning of next week.

Svisstack commented 5 years ago

@somdoron No problem, I actually found interesting fact - the leak is visible only on nodes on which there is no communication activity between Publisher and Subscriber (silence), it's ok from the application perspective.

somdoron commented 5 years ago

Can you extend this list:

https://user-images.githubusercontent.com/864295/63584247-d162e480-c59c-11e9-8480-9f4bd1532964.png

I want to see the root object causing the memory leak

somdoron commented 5 years ago

Also, can you show the incoming reference to the pipe class?

Svisstack commented 5 years ago

It looks like the Pipe is also referenced to the Pub+Sub, however, I don't know it's the same instance.

Svisstack commented 5 years ago

@somdoron paths to the root.

somdoron commented 5 years ago

Funny, I just figured it out myself.

At least in this case it is not a bug.

Once one message will be sent everything will be freed.

From the memory picture I saw pending command holding the reference and causing the issue.

To avoid the issue you can call once in a while the socket.Poll with zero timespan. This will also process pending commands.

Anyway, I think you have a case where subscribers come and go frequently.

Svisstack commented 5 years ago

Thanks. @somdoron, I appreciate the effort and in-depth knowledge of this project.

Have a nice time on the vacations.

Yes, I could have the come and go issue looking at the netstat.

mvburgh commented 5 years ago

Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve

In my case they come and go every few seconds as they are web api requests.

somdoron commented 5 years ago

@mvburgh, i will try to reproduce next week. Only request response sockets? Are you using a proxy? Do you happen to have memory profiler report?

mvburgh commented 5 years ago

No proxy here; it runs between a windows service and website for me. I dont have a profile report at hand.

KamranShahid commented 5 years ago

I have majordomo pattern implemented with broker in one windows service (.net core 2.1) and worker app resides on another windows service (.net core 2.1). https://github.com/NetMQ/Samples/tree/master/src/Majordomo
In worker windows service there are different 16/17 type of workers . each type of worker can have multiple instances. What i were seeing is when i am assigning 10 number of worker against each type my broker application memory increases time to time.

It probably is due to default heartbeat time. Now I am trying setting default heartbeat time at worker side as 10 seconds while on broker 15 seconds.

Memory profiling is bit difficult in my case as i have setup workers and broker in different applications for future scalability perspective

somdoron commented 5 years ago

@ReneOlsthoorn during the time the memory increase to 3Gb are you still sending messages? can it be that it happens only during silence times?

somdoron commented 5 years ago

Can you share the test program?

On Sun, Sep 15, 2019, 17:48 ReneOlsthoorn notifications@github.com wrote:

@somdoron https://github.com/somdoron Yes, the server keeps running and messages are send. During silence times no memory is increased. The memory increase is gradually, every 4 hours one Gb. It depends how much consumers are connecting and disconnecting. I've made a test-program which connects and disconnects. The memoryleak is visible there as well. I've cloned the git sources, so maybe I can see where the problem is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/netmq/issues/788?email_source=notifications&email_token=AAUW75RLBHVWFMEALWGGA53QJZDNVA5CNFSM4HCDUG52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XSIPY#issuecomment-531571775, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUW75WV64U4IVSIJF56JBDQJZDNVANCNFSM4HCDUG5Q .

somdoron commented 5 years ago

Can you share a memory profiler snapshot? that will help alot

ReneOlsthoorn commented 5 years ago

Doron and others, the memory-leak I was investigating was in our own product. My apologies for posting when it was not clear where the problem came from. I've deleted my comments, so new users don't get a wrong impression about NetMQ. Keep up the good work!

mvburgh commented 4 years ago

@somdoron I have spent some more time with this last week, but both setting the linger to 0 and the socket.Poll() every now and then give no better result. The increase stays in YQueue+Chunk and does not get freed over time.

manu-st commented 4 years ago

We are also experiencing something similar in our app. We do not see a leak when we have one server and one client communicating via Request/Response sockets. However, if another client tries to connect to the server while it is already serving another client, the server will leak memory. The way we have it work is that the server can only serve one client, so when a new client connects, it sends a message to tell the client that it cannot communicate and that's pretty much it. Once the client receives that message it disconnects.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

zeromq / netmq

Possible memoryleak in ResponseSocket #788