Open clark-pan opened 5 years ago
I think adding a flush
returning a promise sounds like a good compromise wrt breaking changes and the issue at hand.
pinging @mhadam for his pov.
This sounds like a nice improvement to me and I like the idea of improving the Lambda support of this library. I'm planning on doing some tidy up / modernisation of this tracker for 0.4.0, but I can see this being a nice feature to drop in for 0.5.0.
+1 for this!
+1 for this, my electron application cannot send the "Application/Close" event, since I have no way to check if flush completes before app.quit()....
I'm unsure how to approach this so I thought I might put it up here for some discussion.
Some background:
I'm using the
snowplow-nodejs-tracker
in an AWS Lambda environment to send events. I've detected an issue where some requests are not being sent out every once in a while (rarely).I think I've narrowed it down to an interplay on how AWS Lambda's operating environment works, and the async nature of the emitter in
snowplow-nodejs-tracker
.A quick primer on how AWS Lambda runs your code. When a request to run your function comes in, it spins up an instance of your function, and runs your code. When your function finishes execution, it puts the instance into a kind of
sleep mode
where your code is no longer running, but all objects in memory are kept intact waiting for the next execution run. Future executions can then take advantage of a speed boost because it does not have to re-initialize an instance of the function. This idea is called a cold/hot start in Lambda terminology.The issue comes in how Lambda decides what 'finishes execution' means. There is 2 interface for your code to execute within Lambda, a NodeJS style callback function (i.e.
module.exports = (event, context, callback) => { callback(null, 'success') }
) or an 'async/await' style, available for Node 8.10, where you return a promise in your handler (i.e.module.exports = async (event, context) => { return 'ok' }
In the
callback
style, whencallback
is invoked, Lambda will not immediately end the function execution, but rather wait for NodeJs' Event Loop to drain before ending your method. This behaviour is controllable via a property (callbackWaitsForEmptyEventLoop
) on thecontext
object give to you by AWS, where a value offalse
would immediately end execution on the function. The default istrue
.In the
promise
style, the behaviour is to terminate as soon as the returnPromise
resolves/rejects. Basically acts as ifcallbackWaitsForEmptyEventLoop
isfalse
.The issue:
How this interplays with
snowplow-nodejs-tracker
's emitter is that, because the events are flushed asynchronously, there's a chance that the request has not been sent out before the function execution is terminated, even if you were to callflush
.Due to some interplay with how some other libraries work (namely some libraries like
sequelize
(an ORM) assume a long running nodejs environment and create some infinitely runningsetIntervals
that manage its connection pool), its not always possible to set thecallbackWaitsForEmptyEventLoop
totrue
.It would be a rather big gotcha to anybody who wants to use NodeJS Lambda to process snowplow events, which, given how well Lambda otherwise fits with the snowplow architecture, would be a shame.
I've not really seen a very good solution to this problem as there aren't a lot of libraries that especially cater for the Lambda operating environment. My experience so far with various libraries is to write hacks and work-arounds to make sure they properly clean themselves up when I want the function to end.
My proposal:
I was going to approach this by making the
flush
method return aPromise
that signifies that a success/unsuccessful attempt to flush the events. Would this be something that would be accepted? The return type offlush
is currentlyvoid
so its technically a breaking change, but I'm thinking people aren't actually depending on that(?). Alternatively I could add another method calledflushAsPromised
(I'm not good at naming methods) that has the correct behaviour.Alternatively, if this is not a good idea, could I get some guidance on whether or not the the
Emitter
interface is considered public? I would then implement my ownEmitter
thats more amenable to the Lambda execution environment.The only other tracker I've used is the
snowplow-ruby-tracker
which differentiates betweenAsyncEmitter
andEmitter
. It looks like theEmitter
class in this project is more akin to theAsyncEmitter
insnowplow-ruby-tracker
.Thanks for reading. What do the maintainers think of this? Happy to chat more about this as well.