snowplow / snowplow-nodejs-tracker

Snowplow event tracker for Node.js. Add analytics to your JavaScript apps, node-webkit projects and Node.js servers
http://snowplowanalytics.com
Apache License 2.0
24 stars 24 forks source link

Add support for results of flushing of events #42

Open clark-pan opened 5 years ago

clark-pan commented 5 years ago

I'm unsure how to approach this so I thought I might put it up here for some discussion.

Some background:

I'm using the snowplow-nodejs-tracker in an AWS Lambda environment to send events. I've detected an issue where some requests are not being sent out every once in a while (rarely).

I think I've narrowed it down to an interplay on how AWS Lambda's operating environment works, and the async nature of the emitter in snowplow-nodejs-tracker.

A quick primer on how AWS Lambda runs your code. When a request to run your function comes in, it spins up an instance of your function, and runs your code. When your function finishes execution, it puts the instance into a kind of sleep mode where your code is no longer running, but all objects in memory are kept intact waiting for the next execution run. Future executions can then take advantage of a speed boost because it does not have to re-initialize an instance of the function. This idea is called a cold/hot start in Lambda terminology.

The issue comes in how Lambda decides what 'finishes execution' means. There is 2 interface for your code to execute within Lambda, a NodeJS style callback function (i.e. module.exports = (event, context, callback) => { callback(null, 'success') }) or an 'async/await' style, available for Node 8.10, where you return a promise in your handler (i.e. module.exports = async (event, context) => { return 'ok' }

In the callback style, when callback is invoked, Lambda will not immediately end the function execution, but rather wait for NodeJs' Event Loop to drain before ending your method. This behaviour is controllable via a property (callbackWaitsForEmptyEventLoop) on the context object give to you by AWS, where a value of false would immediately end execution on the function. The default is true.

In the promise style, the behaviour is to terminate as soon as the return Promise resolves/rejects. Basically acts as if callbackWaitsForEmptyEventLoop is false.

The issue:

How this interplays with snowplow-nodejs-tracker's emitter is that, because the events are flushed asynchronously, there's a chance that the request has not been sent out before the function execution is terminated, even if you were to call flush.

Due to some interplay with how some other libraries work (namely some libraries like sequelize (an ORM) assume a long running nodejs environment and create some infinitely running setIntervals that manage its connection pool), its not always possible to set the callbackWaitsForEmptyEventLoop to true.

It would be a rather big gotcha to anybody who wants to use NodeJS Lambda to process snowplow events, which, given how well Lambda otherwise fits with the snowplow architecture, would be a shame.

I've not really seen a very good solution to this problem as there aren't a lot of libraries that especially cater for the Lambda operating environment. My experience so far with various libraries is to write hacks and work-arounds to make sure they properly clean themselves up when I want the function to end.

My proposal:

I was going to approach this by making the flush method return a Promise that signifies that a success/unsuccessful attempt to flush the events. Would this be something that would be accepted? The return type of flush is currently void so its technically a breaking change, but I'm thinking people aren't actually depending on that(?). Alternatively I could add another method called flushAsPromised (I'm not good at naming methods) that has the correct behaviour.

Alternatively, if this is not a good idea, could I get some guidance on whether or not the the Emitter interface is considered public? I would then implement my own Emitter thats more amenable to the Lambda execution environment.

The only other tracker I've used is the snowplow-ruby-tracker which differentiates between AsyncEmitter and Emitter. It looks like the Emitter class in this project is more akin to the AsyncEmitter in snowplow-ruby-tracker.

Thanks for reading. What do the maintainers think of this? Happy to chat more about this as well.

BenFradet commented 5 years ago

I think adding a flush returning a promise sounds like a good compromise wrt breaking changes and the issue at hand.

pinging @mhadam for his pov.

paulboocock commented 4 years ago

This sounds like a nice improvement to me and I like the idea of improving the Lambda support of this library. I'm planning on doing some tidy up / modernisation of this tracker for 0.4.0, but I can see this being a nice feature to drop in for 0.5.0.

af-tomwilkins commented 3 years ago

+1 for this!

ggreco commented 3 years ago

+1 for this, my electron application cannot send the "Application/Close" event, since I have no way to check if flush completes before app.quit()....