Closed mcollina closed 8 years ago
Memory is definitely a concern. The app sending logs shouldn't suffer because of the log transport. As someone who uses TCP for logging, which would you rather have:
In either case the buffer should be adjustable, both in size and if it's even enabled.
I imagine this scenario really only shows up if you're sending to a remote destination. One solution, that doesn't require re-inventing any wheels here, might be to forward to a local queueing server that will de-queue to the remote destination, e.g. some sort of RabbitMQ setup.
Dropping this here for reference -- https://gist.github.com/skeggse/6802812
@jsumners I would rather drop things at the beginning of a queue, a "circular buffer" type of implementation.
A solution for the "local queue", is to use something like level, but that is definitely not part of this library.
Try out the code currently in the develop
branch. It may be pushing stuff into a queue faster than it can dequeue them, but I'm not sure. I haven't come up with a test that confirms it.
Avoid using bind(this)
on those transports, bind is incredibly slow.
You probably want to do the reverse for https://github.com/jsumners/pino-socket/blob/develop/lib/TcpConnection.js#L62-L68, using tcp-ping first and then connecting.
I don't think that's a correct implementation, IMHO you should have that only inside TCP. Also the code does not exhaust the queue when a disconnection happens.
The bind(this)
only happens during initial startup. It shouldn't even be noticeable.
The connect
method is called to start even the first connection. So doing the probe
first would slow down startup. The probe is in place to make sure the connection succeeded. The operation order, as I understand net.Socket
events is:
end
event that signals the remote system died. Repeat (1)close
event that signals the local system hung up. Repeat (1)I thought about passing the queue into the transport and letting the transport handle the queue. That would allow queuing to be a TCP only feature. But I figured that would be overly complex when the queue won't even affect UDP in a significant manner. Am I wrong?
If the queue needs to be wiped when a disconnect occurs, then the would definitely need to be managed by the individual transports. I'm not sold on this, though. In my opinion, the queue should be accepting messages to send once the connection comes back online. Although, it's this part that I don't think is actually happening right now.
I'm very open to suggestions, or code, on how to handle this. This is the scenario I had hoped to avoid in by initially designing it to be a simple pass through.
Uh, yeah, I realized what you're saying about tcp-ping
. I decided it's completely pointless and have removed it altogether.
Regarding bind(this)
, is not the actual bind
call that it is slow, it is the binded function.
We might have drifted a bit from the goal, which is not to provide buffering, but rather implement backpressure / flow control by discarding events that happens in the middle, and then work around the "circular buffer" issue, handling small downtime in the connection.
pipe
the origin stream into the connection stream with { end: false }
resume()
on the origin streampause()
on the origin stream and go back to 2.Step 3 can be delayed by 5-10 seconds to avoid losing too many lines.
I'll look into that. I can see some other improvements given that methodology.
Okay, pino-socket now directly pipes the incoming messages to the destination socket. Silly buffering and CEE formatting has been removed (turns out more was needed for my end goal with that anyway). Currently, it's pausing stdin
while a reconnect is being attempted and resuming it on success. I good way to delay pausing isn't jumping out at me right now.
The benchmark script should be ignored; it may be scrapped altogether.
This is all still only on the develop
branch.
Some suggestions
stdin
in https://github.com/jsumners/pino-socket/blob/develop/lib/TcpConnection.js#L78Writable
, by calling input.pipe(dest, { end: false })
, and when dest
is disconnected, you can input.unpipe(dest)
, and then call input.resume()
to avoid blocking. When the connection comes back, you can call input.pause()
, and then input.pipe(dest, { end: false })
to restart the cycletcpConnection(dest)
 or udpConnection(dest)
.That clears up a suspicion I had about your last post -- dest
is literally the connection. There is no way to pause it on disconnection because at that point it no longer exists. Which means the only stream available for pausing/resuming is stdin
(and I'm not a fan of pausing it either). So I'm not sure what the solution is to this problem. It'd be very nice to rely on the streams API internal buffering, but I don't see how it can be accomplished.
@jsumners build a small prototype just with streams, there is some complexity here that makes things harder than they could be to try this out. You want to build the Writable side of of https://github.com/mcollina/never-ending-stream (discarding data that is being pushed through when the destination is offline).
Would you have time to submit a PR with the solution you have in mind?
Not for a couple of weeks :(. I'm over-committed right now.
Understood. I'll try to get to it before then, but I'm a bit swamped right now too.
@mcollina give commit 296a9e1 a try.
@mcollina regarding your line note, here's what is happening with the tcp connection:
process.stdin
||
||
\ ===> inputStream (a PassThrough)
|| ||
|| \ ===> outputStream (a Writable)
||
\ ===> process.stdout (or whatever else it was piped to)
...
...
The inputStream
is paused and resumed with disconnects and reconnects. The outputStream
writes whatever comes into it to the tcp socket. The socket
points to whatever the current connection is, and does not get written to while inputStream
is paused.
That's what I intend to be happening, anyway.
The correct model for all of this would be to not have intermediate stream, but rather pipe
/ unpipe
the original one. You will avoid a lot of overhead.
Sorry for the late reply, but I am traveling these days.
The original stream is process.stdin
. We both agree that pausing that stream is a bad idea. And pausing a stream is the only way to induce buffering so that data is less likely to be lost while the connection is re-established.
At this point the code is written to the best of my ability without becoming overly complex. If it doesn't solve the problem, or can be done better, I will have to wait on a PR. Otherwise I am willing to brand this as 1.0.0 and push it out.
Do it :). Il giorno mer 7 set 2016 alle 17:55 James Sumners notifications@github.com ha scritto:
The original stream is process.stdin. We both agree that pausing that stream is a bad idea. And pausing a stream is the only way to induce buffering so that data is less likely to be lost while the connection is re-established.
At this point the code is written to the best of my ability without becoming overly complex. If it doesn't solve the problem, or can be done better, I will have to wait on a PR. Otherwise I am willing to brand this as 1.0.0 and push it out.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jsumners/pino-socket/issues/4#issuecomment-245264124, or mute the thread https://github.com/notifications/unsubscribe-auth/AADL4z0rq6GS0AAaXGS36McaAPkbtTS0ks5qnq05gaJpZM4I0nuc .
1.0.0 is published. Thank you for the feedback and patience on this issue.
currently pino-socket is not handling backpressure over TCP, not a big deal, but memory might grow quite a bit.
I'm not sure what we should do in case, meaning at some point that data will be buffered up, and we probably do not want that to happen.