Ensuring writable stream is flushed on exception

timbowhite commented 12 years ago

Hi, this isn't really an issue w/ bunyan, but more of an FYI hack I'm using to ensure writable streams are flushed/closed when unhandled exceptions occur in a process.

For instance, the following will result in an empty foo.log file:

var Logger = require('bunyan'),
    util = require('util'),
    foo = new Logger({
        name: "foo",
        streams: [{
            path: "./foo.log"
        }]
    });

foo.info("hello");

throw new Error('suicide');

That stinks as the log file won't contain the "hello" entry, nor offer any clue as to what happened. A hack to rectify this:

var Logger = require('bunyan'),
    util = require('util'),
    foo = new Logger({
        name: "foo",
        streams: [{
            path: "./foo.log"
        }]
    });

foo.info("hello");

// using uncaughtException is bad m'kay
process.on('uncaughtException', function (err) {
    // prevent infinite recursion
    process.removeListener('uncaughtException', arguments.callee);

    // bonus: log the exception
    foo.fatal(err);

    if (typeof(foo.streams[0]) !== 'object') return;

    // throw the original exception once stream is closed
    foo.streams[0].stream.on('close', function(streamErr, stream) {
        throw err;
    });

    // close stream, flush buffer to disk
    foo.streams[0].stream.end();
});

throw new Error('suicide');

This probably makes node.js purists grind their teeth, but it's effective at logging both the "hello" entry and the exception, and rethrows the original exception.

Doing this w/ multiple streams would be trickier, as you'd have to detect that they were all closed before rethrowing the exception. I'm not sure that bunyan triggers a "close" event when all writable streams are closed, ie: foo.on('close', function(streamErr, stream) { throw err; });.

One last note, calling process.nextTick({ process.exit() }); instead of plain old process.exit(); usually gives the writable stream(s) a chance to flush.

trentm commented 12 years ago

Thanks for this. Yes, a little bit gross. :) Might be nice to have Bunyan have a convenience method for something like this. I'm curious what winston and others do, if anythin.g

trentm commented 12 years ago

Had some discussion on this. Basically I should finish the implementation of Logger.close() (it is started and commented out). This would walk through and call stream.end() for all the streams that should be closed (per the existing stream.closeOnExit attribute). This is similar in spirit to Python's http://docs.python.org/library/logging.html#logging.shutdown.

It should probably be called Logger.end() I guess.

Two remaining questions.

Q1: The only other question then is whether to automatically set this up? That would mean calling this on logger creation:

process.on('exit', function () {
   self.end();
});

The non-default alternative is to just document the issue (that process.exit() can result in not all log messages being flushed to their files) and document this recommended usage:

process.on('exit', function () { log.end(); });

Q2: Deal with the outstanding questions of ownership of streams w.r.t log.child() usage.

trentm commented 11 years ago

Discussed an issue with @rgulewich on logger output not always getting flushed. At least with node 0.8 on SmartOS, calling stream.end() on all the Bunyan streams, you still don't always get all the log lines flushed. E.g.: https://gist.github.com/8c2cb46911a1202471b9 That's with two loggers in play, but Rob says the same thing happens with just one logger.

So, for Bunyan to make this easier to get right, it really needs a Logger.end() but that needs to be async with a callback to sit there waiting for a drain even on streams, if necessary. Because that would need to be async, you can't just toss this Logger.end() into a process.on('exit', function () { ... }). Yuck.

@isaacs: I don't suppose you know if streams2 in node 0.10 will help with flushing write streams on process exit?

isaacs commented 11 years ago

@trentm Well, process.stdout and process.stderr should be sync in most cases, so if that's where you're logging to, then it should not be an issue to flush. How is this program being run?

trentm commented 11 years ago

@isaacs This is writing to a file. I imagine @rgulewich was just runnign "/usr/node/bin/node script.js" where script.js is the gist above.

matthiasg commented 11 years ago

btw .. process.stdout is not sync on windows .. at least not now. this causes issues right now for e.g. capturing output of a node process (e.g. mocha) when that process exists.

also see: https://github.com/joyent/node/issues/3737

trollixx commented 10 years ago

Any progress on this? An output file is empty if an application exits immediately after logging.

emmenko commented 10 years ago

@trentm any update on this? Did someone find a solution to flush the logs before doing a process.exit?

svenmueller commented 10 years ago

Seems like there is no progress regarding this issue :/

alessioalex commented 10 years ago

These two links might help you guys:

http://stackoverflow.com/questions/18771707/how-to-flush-winston-logs https://github.com/flatiron/winston/blob/master/lib/winston/logger.js#L594-L637

It seems that winston has a callback after a log has been written. In the case of uncaughtException it will log the error to all transports with a timeout of 300 miliseconds (after that it will exit the process).

programmerdave commented 9 years ago

Any updates on this? I'm sending my logs to loggly using a Bunyan writable. How can I make sure that all writes finish before I call process.exit?

5ay3h commented 9 years ago

+1. This is a major bug. Any workarounds?

tnunes commented 9 years ago

+1

Still no developments on this 3 year old issue?

matthiasg commented 9 years ago

this looks like a way even if node doesnt do this automatically: https://github.com/eslint/eslint/commit/bca4ae791015cf0f418e405de8902a5a572c9f7c

matthiasg commented 9 years ago

also see here: https://github.com/joyent/node/issues/3737#issuecomment-18765224

5ay3h commented 9 years ago

This fix only STDOUT flush, not file stream :(

th0r commented 9 years ago

+1

sommestad commented 9 years ago

+1

Vegard- commented 9 years ago

+1

mojojoseph commented 8 years ago

+1 Wasted an hour wondering why stream was empty only to come across this.

5ay3h commented 8 years ago

@trentm, good bye bunyan...

qubyte commented 8 years ago

I was putting off adding more noise to this thread, but it's been some time and there doesn't appear to be movement on bunyan at all. What is the status of this issue and bunyan in general?

tnunes commented 8 years ago

Is it safe to assume development on bunyan effectively stopped and we're all better off either forking it or moving to an alternative logging solution?

brianjquinn commented 8 years ago

I would also like to know the answer to the last comment (bunyan development has stopped)... as I have recently incorporated bunyan into a project and came upon this situation where a rotating-log stream does not get logged when calling process.exit()

klyngbaek commented 8 years ago

+1

supertong commented 8 years ago

+1

Srokap commented 8 years ago

+1

nnashok commented 8 years ago

+1

cwalltx commented 8 years ago

+1

cchamberlain commented 8 years ago

To anyone coming across this now, the following line:

    // prevent infinite recursion
    process.removeListener('uncaughtException', arguments.callee);

is no longer necessary per uncaughtException. I'm guessing that back in 2012 that may not have been the case.

jdthorpe commented 7 years ago

> npm install winston-log-and-exit

then,

require('winston-log-and-exit');
process.on('uncaughtException', function (err) {
    // use `winston` or your own Logger instance as appropriate
    winston.error("Uncaught Exception",err);
    winston.log_and_exit("info","bye",1)
});

rationull commented 7 years ago

+1. This is kind of a nasty bug.

rationull commented 7 years ago

I'm still a Javascript/Node newbie but it seems to me that the suggested workarounds that involve waiting for an async notification that the stream has closed, or using process.nextTick() to delay the exit (or to rethrow the exception in the case of the original example) would allow the event loop to continue processing whatever was queued up next (chunks of async functions/promise chains, requests, etc.). Since the app is in an unknown state at this point due to the uncaught exception, that could lead to additional uncaught exceptions, or other unpredictable behavior, which is counterproductive in cases like this.

My takeaway here is that we either need a true synchronous write or flush for the log streams, or need to be relying only on stdout and piping to persistent storage from the host OS rather than the Node process. From previous comments it sounds like even stdout may be a problem on Windows.

I've been looking at the fs APIs and source code and it looks at least theoretically possible to me that bunyan could force a synchronous write (on demand, NOT for every write) but I'm not sure that this is compatible with the streams API that it currently uses (the sync methods are part of the fs APIs, not the steams APIs).

Am I missing something?

I have not traced all the way through from bunyan's stream output through to the underlying C fs implementation in enough detail to fully understand whether the problem here is due to JS style asynchronicity or whether it's just due to more traditional buffering issues but so far to me it looks like the former.

rationull commented 7 years ago

I just hacked on bunyan a little and confirmed that writes straight through to the underlying file descriptor using fs.writeSync() are indeed written synchronously and will make it to the log file, at least in my test case. This will only work if the file descriptor has already been opened by the time the exception is thrown of course.

So another class of workarounds for this might be to provide a way to write a log message synchronously, which could be used in cases where it's known the process is about to terminate.

This is kind of an edge case, but @trentm do you have any input on how this would be best represented in bunyan's API if one were to submit a PR? Separate *sync() versions of the output functions? A sync mode flag on the logger? Something else?

jdthorpe commented 7 years ago

@rationull , If you're worried about flushing the logs before node exits on an uncaught exception, shutdown signal, or normal exit, why not use the winston-log-and-exit patch like so:

require('winston-log-and-exit');
process.on('uncaughtException',  // or 'sigint' or 'exit'
    function (err) {
    winston.error("Uncaught Exception",err);

    // this message and all before it will be flushed before exiting:
    winston.log_and_exit("info","bye",1); 
});

Srokap commented 7 years ago

@jdthorpe This module doesn't really address the concern that we probably should not allow event loop to run any further. If we contained the logic to next tick queue, it would be reasonable solution IMO (still not perfect), but I doubt it is possible since we likely need to process something via IO events queue when flushing stuff.

fs.writeSync might work, but remember that targets can be just a custom stream. That will be a bigger issue.

IIRC core winston lib did something as dumb as setTimeout for 3s before killing process. For lack of a better solution it is better than nothing I guess.

rationull commented 7 years ago

@Srokap I was about to respond to the same effect based on my reading yesterday, but running a quick example it appears the finish event is indeed emitted synchronously with the end() call. It's not clear that the synchronicity is guaranteed through the whole stack but it looks at least possible that it is under the current implementation of the streams API.

rationull commented 7 years ago

@jdthorpe OK, actually it does appear that the 'finish' handlers will not be invoked until after the calling function terminates (i.e. after the uncaughtException handler terminates), so @Srokap is correct that this solution does not address the asynchronicity concern.

Actually I believe the implementation of winston-log-and-exit relies on this behavior -- if the 'finish' event were emitted synchronously then (numFlushes === numFlushed) would be true in the first 'finish' event handler called, and the callback would be called prematurely. I was about to log a bug on that, but it's not really a bug since it does work correctly IF you're OK with letting the event loop continue for a few ticks.

It's not difficult to port the winston-log-and-exit solution to bunyan. I'm not planning to use it due to this flaw, but if anyone would find it useful let me know and I can set aside some time to clean it up and make a PR...

Or I would love to be set straight if the the concern I share with @Srokap is not actually an important issue. A writeSync based solution would not fit into the framework nearly as cleanly.

jdthorpe commented 7 years ago

You're correct that winston-log-and-exit makes no attempts to change the asynchronous nature of winston -- just adds the guarantee that all the prior logs will be flushed before it calls process.exit (or calls the exit handler).

For what it's worth, the reason I'm personally fine with asynchronous file writes, is that it lets Node write out to log files in the background. If you switch to using fs.writeFileSync, then you're code will stop running until the each log message is written to disk due to Node's everything-runs-in-parallel-except-your-code (i.e. IO is handled in parallel, except when using the sync io functions), and performance for things like servers will be slooooooow.

On Tue, Feb 28, 2017 at 2:01 PM, rationull notifications@github.com wrote:

@jdthorpe https://github.com/jdthorpe OK, actually it does appear that the 'finish' handlers will not be invoked until after the calling function terminates (i.e. after the uncaughtException handler terminates), so @Srokap https://github.com/Srokap is correct that this solution does not address the asynchronicity concern.

Actually I believe the implementation of winston-log-and-exit relies on this behavior -- if the 'finish' event were emitted synchronously then (numFlushes === numFlushed) would be true in the first 'finish' event handler called, and the callback would be called prematurely. I was about to log a bug on that, but it's not really a bug since it does work correctly IF you're OK with letting the event loop continue for a few ticks.

It's not difficult to port the winston-log-and-exit solution to bunyan. I'm not planning to use it due to this flaw, but if anyone would find it useful let me know and I can set aside some time to clean it up and make a PR...

Or I would love to be set straight if the the concern I share with @Srokap https://github.com/Srokap is not actually an important issue. A writeSync based solution would not fit into the framework nearly as cleanly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/trentm/node-bunyan/issues/37#issuecomment-283175370, or mute the thread https://github.com/notifications/unsubscribe-auth/AKbIBfFZpu0pODBIjkeffRBj6giUv8Jpks5rhJlXgaJpZM4AH6Su .

rationull commented 7 years ago

@jdthorpe agreed of course making all log file writes synchronous would be a poor choice. The idea I'm toying with is allowing opt-in synchronous write at least to be used in an uncaughtException handler to ensure that the exception causing the "crash" is written out. This is still not ideal because it leaves open the possibility there are other important log messages waiting in the queue that won't be flushed though.

Another option would be to synchronously buffer log output and asynchronously flush the buffer. Then we could force a synchronous buffer flush before calling process.exit(). Based on my current simplistic understanding that would require rewriting the bunyan file output streams to maintain their own buffers and use fs.writeSync() periodically. That would still impose a periodic slow write delay on the event loop though.

Srokap commented 7 years ago

This whole discussion makes me think there should be 2 options.

One is to exit immediately not even looking at logs because you're so worried that some code already in event queue is so wrong, it will corrupt your data.

Other is that you don't care that much, the problematic logic chain just died, you assume others will either fail for same reason or will be just fine. You can ie. disconnect worker from cluster and than call some function like logger.flushAllLogsAndDie() that takes it's time to make sure all output streams get flushed (or die trying).

We can't really assume much that would be true for everyone, so better to give lib users an option. I personally currently use dumb setTimeout approach I saw in winston. Either way it's a precaution for me, because I don't even remember when I had last uncaught error that crashed my worker outside of dev machine. It would be nice o have optional function that gives me some guarantees about things getting flushed.

rationull commented 7 years ago

That makes sense to me too in general. Maybe it's my non-JS background but I can't help but think there could still be a middle ground option where file system log targets are synchronously flushed (i.e. don't wait for remote log targets) but perhaps that is an edge case and it doesn't seem to fit in well with Node's file streams setup.

I am suspicious that I'm worrying too much and the overall best solution is the second one, with a safeguard in place to prevent additional uncaught exceptions from conflicting with the first one.

trentm commented 7 years ago

@rationull and @Srokap Thanks for the recent discussion so far. I'm hoping to get back to this soon with a better answer. But for now, my experience with flushing output (at least for CLIs) resulted in this module: https://github.com/joyent/node-exeunt#readme

I think the node-exeunt discussion applies to exit handling of services as well. The README there focuses on CLIs mainly because at my work (at Joyent) our node.js services are almost always configured to crash hard on an unhandled exception (via node --abort-on-unhandled-exception), so I haven't experienced the issue with services myself as much.
I think the node-exeunt notes apply to file handles to files as well, and not just to stdout/stderr.

One suggestion from node-exeunt is that flushing output on exit is the responsibility of the app, and not the library. That relates @Srokap comments that there are really 2 options: does the app author want to hard exit in an unstable state? or does the app author want to limp along long enough to flush output before exiting?

What's missing here then is deciding whether and how Bunyan should provide some API to help with all this. I wonder if tell app authors to choose the appropriate "Solution" of the 4 discussed at https://github.com/joyent/node-exeunt#readme would be sufficient.

rationull commented 7 years ago

@trentm thanks for the node-exeunt link -- great coverage of an interesting topic! I 100% agree that the decision of whether to flush output or hard exit (or even just continue running) should be up to the app itself and not bunyan or any other library.

Based on a quick read of the exeunt discussion (and also on my improved understanding of Node's concurrency model after the last couple months.. my background is in threaded concurrency environments), it seems to me that your suggestion of just telling app developers to choose one of the defined solutions probably is enough. The actual implementation of any given solution seems like it would be out of scope for a logging framework. For my purposes the exeunt() strategy seems most appealing.

You mentioned that your services at work simply hard crash on unhandled exceptions, does that imply that for diagnosing the crash you rely on capturing whatever Node itself writes to the console and that missing log messages leading up to the exception just haven't been a problem?

If I've got questions on the exeunt() discussion where should I post those? Issue or PR on that repo or here?

trentm commented 7 years ago

does that imply that for diagnosing the crash you rely on capturing whatever Node itself writes

Actually we rely mostly on using what is in the OS-generated crash dump on abort(). Here is one of the earlier articles on that: http://dtrace.org/blogs/dap/2011/10/31/nodejs-v8-postmortem-debugging/

At least a while ago there was effort (and success I think) in being able to do postmortem debugging on Linux crash dumps as well... using tooling on SmartOS. I'm not familiar with the Linux side of things myself though as our environment (at least for server-side) is mostly all SmartOS (an illumos/SunOS derivative).

When you have this going, one interesting technique is to keep a ring buffer of log records which are just kept in memory. Upon a crash, that memory is inspectable. Though, in fairness, not a lot of our services currently do this. Mostly the crash dump and mdb_v8 tooling allow for pretty complete ability to inspect program state at crash time, which is invaluable in many cases.

trentm commented 7 years ago

If I've got questions on the exeunt() discussion where should I post those? Issue or PR on that repo or here?

@rationull Perhaps issues on the exeunt repo would be best for that.

rationull commented 7 years ago

@trentm nice -- I'll be sure to read up on debugging with crash dumps. I've got some experience with that for native Windows applications but haven't done it with Node yet. Thanks!

0xcaff commented 6 years ago

This lack of being able to explicitly close the logger and flush all logs results in logs being lost in environments like AWS lambda and Google Cloud Function.

rationull commented 6 years ago

@0xcaff are you losing logs in AWS lambda even if the program terminates normally? If so, maybe there's something more going on there? If there are outgoing IO events I don't believe Node should terminate before they complete, on normal program exit.

0xcaff commented 6 years ago

I haven't experienced it in lambda but in google cloud functions. This guy says he's had the problem in lambda: https://github.com/trentm/node-bunyan/issues/95#issuecomment-148799329

trentm / node-bunyan

Ensuring writable stream is flushed on exception #37