Open mvburgh opened 5 years ago
Any one working on it?
Not me personally. I had a quick glance at the code but could not see any possible cause in the meanwhile.
@mvburgh Did you made any progress since then on that?
I will take a look next week, on vacation this week.
Code that reproduce this will help
Also @svisstack, can you check with socket.Options.Linger set to zero? I suspect it might be the issue.
Code which causing this is very simple just the PublisherSocker who had connected ~10 subscribers:
I'm not sure it's the same bug as the initial bug was related to the ResponseSocket.
On the Subscriber side, this bug does not exist.
From the memory dumps, we can see that there are probably too many Pub+PubSession objects and along with that Pipe, YPipe, YQueue, but all the memory is allocated on the YQueue+Chunk
@somdoron I confirm that Linger is equal to the {00:00:00} at the end of the Start() function in the snippet provided above @(Start(): return port;)
publisher.Options {NetMQ.SocketOptions} Affinity: 0 Backlog: 100 DelayAttachOnConnect: false DisableTimeWait: false Endian: Big IPv4Only: true Identity: null LastEndpoint: "tcp://0.0.0.0:61584" LastPeerRoutingId: null Linger: {00:00:00} MaxMsgSize: -1 MulticastHops: 1 MulticastRate: 100 MulticastRecoveryInterval: {00:00:10} PgmMaxTransportServiceDataUnitLength: 'publisher.Options.PgmMaxTransportServiceDataUnitLength' threw an exception of type 'NetMQ.InvalidException' ReceiveBuffer: 0 ReceiveHighWatermark: 1000 ReceiveLowWatermark: 0 ReceiveMore: false ReconnectInterval: {00:00:00.1000000} ReconnectIntervalMax: {00:00:00} SendBuffer: 0 SendHighWatermark: 0 SendLowWatermark: 0 TcpKeepalive: false TcpKeepaliveIdle: {-00:00:00.0010000} TcpKeepaliveInterval: {-00:00:00.0010000}
Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve
Thanks, does the subscribers come and go? Can you check who is referencing the PubSession?
@somdoron Take a look at the incoming reference chart.
In my use-case, the subscribers should not come and go frequently, but there could be a bug on my side causing the come and go and I analyzing that at the moment.
I'm using the 4.0.0.239-pre version.
Can you send me the report? Which application are you using?
somdoron AT gmail DOT com
I'm not in front of a computer this week, but I will take a look beginning of next week.
@somdoron No problem, I actually found interesting fact - the leak is visible only on nodes on which there is no communication activity between Publisher and Subscriber (silence), it's ok from the application perspective.
Can you extend this list:
https://user-images.githubusercontent.com/864295/63584247-d162e480-c59c-11e9-8480-9f4bd1532964.png
I want to see the root object causing the memory leak
Also, can you show the incoming reference to the pipe class?
It looks like the Pipe is also referenced to the Pub+Sub, however, I don't know it's the same instance.
@somdoron paths to the root.
Funny, I just figured it out myself.
At least in this case it is not a bug.
Once one message will be sent everything will be freed.
From the memory picture I saw pending command holding the reference and causing the issue.
To avoid the issue you can call once in a while the socket.Poll with zero timespan. This will also process pending commands.
Anyway, I think you have a case where subscribers come and go frequently.
Thanks. @somdoron, I appreciate the effort and in-depth knowledge of this project.
Have a nice time on the vacations.
Yes, I could have the come and go issue looking at the netstat.
Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve
In my case they come and go every few seconds as they are web api requests.
@mvburgh, i will try to reproduce next week. Only request response sockets? Are you using a proxy? Do you happen to have memory profiler report?
No proxy here; it runs between a windows service and website for me. I dont have a profile report at hand.
I have majordomo pattern implemented with broker in one windows service (.net core 2.1) and worker app resides on another windows service (.net core 2.1).
https://github.com/NetMQ/Samples/tree/master/src/Majordomo
In worker windows service there are different 16/17 type of workers . each type of worker can have multiple instances. What i were seeing is when i am assigning 10 number of worker against each type my broker application memory increases time to time.
It probably is due to default heartbeat time. Now I am trying setting default heartbeat time at worker side as 10 seconds while on broker 15 seconds.
Memory profiling is bit difficult in my case as i have setup workers and broker in different applications for future scalability perspective
@ReneOlsthoorn during the time the memory increase to 3Gb are you still sending messages? can it be that it happens only during silence times?
Can you share the test program?
On Sun, Sep 15, 2019, 17:48 ReneOlsthoorn notifications@github.com wrote:
@somdoron https://github.com/somdoron Yes, the server keeps running and messages are send. During silence times no memory is increased. The memory increase is gradually, every 4 hours one Gb. It depends how much consumers are connecting and disconnecting. I've made a test-program which connects and disconnects. The memoryleak is visible there as well. I've cloned the git sources, so maybe I can see where the problem is.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/netmq/issues/788?email_source=notifications&email_token=AAUW75RLBHVWFMEALWGGA53QJZDNVA5CNFSM4HCDUG52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XSIPY#issuecomment-531571775, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUW75WV64U4IVSIJF56JBDQJZDNVANCNFSM4HCDUG5Q .
Can you share a memory profiler snapshot? that will help alot
Doron and others, the memory-leak I was investigating was in our own product. My apologies for posting when it was not clear where the problem came from. I've deleted my comments, so new users don't get a wrong impression about NetMQ. Keep up the good work!
@somdoron I have spent some more time with this last week, but both setting the linger to 0 and the socket.Poll() every now and then give no better result. The increase stays in YQueue+Chunk and does not get freed over time.
We are also experiencing something similar in our app. We do not see a leak when we have one server and one client communicating via Request/Response sockets. However, if another client tries to connect to the server while it is already serving another client, the server will leak memory. The way we have it work is that the server can only serve one client, so when a new client connects, it sends a message to tell the client that it cannot communicate and that's pretty much it. Once the client receives that message it disconnects.
This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.
my code is already minimalistic; only a RequestSocket and ResponseSocket. I'm creating a RequestSocket per few seconds based on the requests from a Web Controller
Originally posted by @mvburgh in https://github.com/zeromq/netmq/issues/737#issuecomment-471332420