grijjy / GrijjyCloudLogger

GrijjyCloudLogger, remote logging for Windows, iOS, Android, macOS and Linux
Other
108 stars 29 forks source link

Application hangs after sending 10.000 messages when LogBroker is not running #12

Open luebbe opened 3 years ago

luebbe commented 3 years ago

I'm trying to track down a problem when our application hangs when using GrijjyLog and the log broker is not running. After about 10.000 log messages (10-15 minutes), the application suddenly becomes unresponsive (Windows 10, Delphi 10.3.3 or 10.4.1). When I start the log broker, the application becomes responsive immediately. It is, as if the suddenly available connection to the log broker is un(b)locking something. When the broker runs all the time, the application runs for hours without problems. When I log to a memo instead of using grijjylog.send, the application also runs for hours.

I put a breakpoint on FMessageQueueLock.Leave in

procedure TZMQProtocol.Send(var AMsg: PZMessage);
...
  FMessageQueueLock.Enter;
  try
    FMessageQueue.Enqueue(AMsg);
  finally
    FMessageQueueLock.Leave;
  end;
  { the zmsg_send will automatically destroy the message, so we need to
   prevent the destruction of the object by our own method }
  AMsg := nil;
end;

with FMessageQueue.count > 10 as condition, but it only triggered a few times during startup of the application and never later. So it's probably not FMessageQueue running full.

Basically the question is "What happens if nobody listens for a long time"?

Any help is greatly appreciated.

luebbe commented 3 years ago

This bug is 100% reproducible with the provided sample apps. Steps to reproduce:

  1. start the log viewer
  2. start the FMX sample client
    1. click on "connect"
    2. go to "other"
    3. send 10.000 messages
    4. send 10.000 messages -> sample client hangs
  3. start the log broker -> sample client is responsive again -> exactly 10.000 messages are immediately displayed in log viewer. Where are the other 10.000? So either the protocol discards 10.000 messages, but still blocks the sending application or the log viewer is limited to 10.000 messages? The timestamp of the messages is the timestamp when they were received by the client, not the timestamp when they were sent.

If you want you can:

  1. stop the log broker
  2. repeat 2.iii until the client hangs again
allendrennan commented 3 years ago

This is not really a bug, it's a feature. ZeroMQ is designed so when messages are streamed between nodes and the target node is not available, the sender can queue messages until the node comes back online. We set the upper limit to 10K for sending and receive queues so memory is not consumed indefinitely (see SND_HWM constant in the following), https://github.com/grijjy/DelphiZeroMQ/blob/master/PascalZMQ.pas

ZeroMQ lets you override this behavior and set the value to 0, which means unlimited see, http://api.zeromq.org/3-0:zmq-getsockopt Specially the parameters, ZMQ_SNDHWM and ZMQ_RCVHWM.

GrijjyLogBroker was never intended to operate as a forever running remote logger, but you could certainly create a custom version of the concept using the ExampleWorker, ExampleBroker and ExampleClient projects that we provide in https://github.com/grijjy/DelphiZeroMQ that more precisely matches your needs.

luebbe commented 3 years ago

Thanks for your reply. Is there any notification that a sender can not get rid of the messages (and the queue is filling up)? The current implementation blocks the sending application hard until the receiver comes online. I'm not interested in raising any limits. That would just make the application hang later and not change anything. A queue size of 10 is as good as 10.000.

If I knew that there's no broker listening, I wouldn't send any log messages.

Is there some option to tell ZMQ to discard messages that it can't send?

allendrennan commented 3 years ago

ZeroMQ has the ability to queue to disk. This article covers the subject, https://dzone.com/articles/zeromq-flow-control-and-other

luebbe commented 3 years ago

Queuing to disk doesn't solve the problem, it only delays it. IIUC, there's not option not to send a message if delivery is not possible, because ZMQ wants to deliver messages as reliably as possible and rather blocks the sender than telling it "sorry, I can't handle that now, try again later" and allowing it to continue. This doesn't even allow the sender to handle the problem.

allendrennan commented 3 years ago

I am not an expert in ZeroMQ but they have a concept of non-blocking, ZMQ_NOBLOCK. Like I mentioned before, the Grijjy logger isn't designed to work in the way you want, nor does it expose every possible pattern. You can build these types of solutions with various patterns using our ZeroMQ header translations for Delphi.

luebbe commented 3 years ago

Thanks, I'll see what I get done. For the time being I disabled logging by default. It was nice to be able to fire up the log broker and -viewer at any time while the application is running and see the log messages coming in.

luebbe commented 3 years ago

BTW: I'd suggest to leave this issue open and tag it as "question", because other people might run into the same issue too.