microsoft / ApplicationInsights-node.js

Microsoft Application Insights SDK for Node.js
MIT License
322 stars 141 forks source link

Data loss at process exit #871

Open apotterri opened 2 years ago

apotterri commented 2 years ago

Expected behavior: if there is unsent data when the process is exiting, TelemetryClient.flush({callback: cb}) should send it, and call cb once it's sent.

Actual behavior: cb only called when there's no unsent data.

PR #746 made Sender.send async: https://github.com/microsoft/ApplicationInsights-node.js/pull/746/files#diff-8f8fde719f491d55f9740db69f902f6c192818ea548949e483d6193051832e46L108 .

Channel.triggerSend (called by TelemetryClient.flush) ignores the Promise that send returns: https://github.com/microsoft/ApplicationInsights-node.js/blob/bbb2c05059c5d4893c50c2bd6b2239eccea7bd54/Library/Channel.ts#L86 .

So, for example, this contrived code

TelemetryClient.flush({callback: () => {
  process.exit(1);
});
process.exit(0);

should always exit with code 1. Instead, if there's data in the TelemetryClient's channel, it will exit with code 0.

hectorhdzg commented 2 years ago

@apotterri you can use isAppCrashing flag to force flush to run synchronously, we have that flag to be able to capture exception telemetry when there is an unhandled error in the app

appInsights.defaultClient.flush({ isAppCrashing: true, callback: () => { console.log("Callback done"); } }); console.log("Process Done");

apotterri commented 2 years ago

Hi @hectorhdzg. Thanks for this. I looked at isAppCrashing, but it doesn't really match my use case.

Setting it to true causes isNodeCrashing to be true here: https://github.com/microsoft/ApplicationInsights-node.js/blob/c13a78d900be17f000992c2a2fbfb162062c5271/Library/Channel.ts#L89

So, by my reading, flush doesn't send the HTTP request synchronously. Instead, the data gets written to disk, and will be sent the next time a successful HTTP request is made. Is that right?

We don't have any guarantee that our users will run our client a second time. So, even if we enabled saving to disk, we still might miss that final event.

hectorhdzg commented 2 years ago

@apotterri that is correct I just realized it will not be actually send to ensure telemetry is not lost, flush method is asynchronous but we only support callbacks to handle that, not promises, the behavior your are seeing is by design, we are working on moving to use promises instead and we already started in internal APIs but updating public APIs like flush method will generate breaking changes, we are planning to have a new major version release of the SDK with this and other kind of changes next semester

apotterri commented 2 years ago

Ok, thanks very much. I've got a workaround in place that's good enough for now.

We'll look at upgrading when the new release comes out.

dornfeder commented 2 years ago

Ok, thanks very much. I've got a workaround in place that's good enough for now.

We'll look at upgrading when the new release comes out.

Hi There, can i ask how you did your workaround? Having a similar problem ATM...

apotterri commented 2 years ago

@dornfeder It's not great. I'm basically doing:

TelemetryClient.flush();
setTimeout{() => {
  process.exit(code);
}, 1000);

which will get me the events most of the time.

JacksonWeber commented 4 months ago

@apotterri @dornfeder This should work now as we've upgraded to OpenTelemetry. Please give this scenario a try again. Thank you!