Closed mmarchini closed 3 years ago
We could also make flushSync
emit this error on async mode so that the user can backoff from attempting to write to an unwritable location (which causes any application in this situation to hang).
Does the fd
ever recover? Will it ever be possible to write to that destination?
I'm ok for your two main points anyway. As for the "error" behavior, I would emit an error for flush()
or throw in case of flushSync()
if the "wait" for EAGAIN
goes over a maxEAGAIN
option (or pick any name really).
(This is one more time where I thanks this library is out of Node.js core).
It could recover or not, depending on the issue. For example, if fd
is a named pipe being read by a pino transport that sends data to a log aggregating service, and that service stops responding/returns only 500s for a while and the pino transport keeps retrying, fd
will recover if/when the service recovers. In this case if the service doesn't recover, fd
doesn't recover either. A situation where fd
wouldn't recover is when the pino transport mistakenly stops reading from the named pipe (because of a coding mistake, for example).
As for the "error" behavior, I would emit an error for
flush()
or throw in case offlushSync()
if the "wait" for EAGAIN goes over amaxEAGAIN
option (or pick any name really).
I like the idea of setting a retry limit and emitting an error if it exceeds that limit. I'm not sure about flushSync()
throwing though: if the user still wants to recover, alert or drop data when flushSync
fails, that would need to be done on a uncaughtException
handler, which seems less than ideal since it's a catch-all errors in Node.js. It's worth noting that in some cases the code might not be aware that flushSync
is being called (finalLogger
instance comes to mind), so it's not as simple as adding try/catches wherever possible. Maybe for sonic-boom
running with sync: false
, flushSync
also emits an error instead of throwing? Since that would be a breaking change anyway I'll start by sending a PR where flushSync
throws if retries are exceeded.
As for naming the option, what about maxFlushRetries
?
That works for me. We are prepping pino@7, so it's a good time for a major bump.
We're running into a crash that we strongly suspect is due to this issue.
Is there any workaround available before this major release?
We do not have a fix for this even in a minor. Would you like to send a PR @That3Percent?
We do not have a fix for this even in a minor. Would you like to send a PR @That3Percent?
Probably not. (Sorry)
@mmarchini Would you still consider working on this?
Still on my list but didn't have the bandwidth to work on it yet
Platform: Linux Version: 1.4.0 Node.js Version: v12.20.1
If
sonic-boom
is running on async mode and flush fails constantly withEAGAIN
(for example, if writing to a named pipe where the process supposed to read from the pipe stops reading), the buffer will keep growing indefinitely without a way for the user to identify the issue (becauseEAGAIN
is not propagated to the user). While not being able to flush for a long period of time is an indication that something elsewhere went terribly wrong, sinceEAGAIN
is not propagated there's no way for the application to identify that the buffer is not getting flushed and is growing indefinitely.I have two suggested features that would help users deal with this kind of istuation:
EAGAIN
, either as it's own event, or as part oferror
(would need to be optional and disabled by default, or that would be a breaking change)_buf
, but that's not a supported API and would involve many checks)An alternative would be to privde a
maxLength
option and then letsonic-boom
drop the oldest X bytes from the buffer. But emittingEAGAIN
gives flexibility so that users can not only drop data, but also send alerts in perform other actions in cases where the issue is detected.If the two suggestions above are acceptable I'm willing to send a pull request for them.