segmentio / analytics-next

Segment Analytics.js 2.0
https://segment.com/docs/connections/sources/catalog/libraries/website/javascript
MIT License
384 stars 128 forks source link

Setting a low `flushInterval` effectively breaks `closeAndFlush()` #1092

Open daguej opened 1 month ago

daguej commented 1 month ago

We're using @segment/analytics-node in a serverless environment that is very strict about ending execution after the main entry point's promise resolves. The hosting node process is killed immediately after the promise resolves, so if there was anything still happening in the background, it is lost.

We ~followed~ misread the guidance found in the readme and accidentally set flushInterval to 1. We found some events were being dropped, particularly if fired shortly before the serverless function finished and resolved.

So we then added await analytics.closeAndFlush() to the end of our function to ensure events were being flushed before exiting.

To our surprise, this did not help; events were still being lost.

I believe analytics-node.ts#L47 is to blame. The library sets the default timeout for closeAndFlush() to flushInterval * 1.25, meaning our call to closeAndFlush() was quietly resolving after 1.25ms, which is definitely not enough time to establish a TLS connection to Segment's servers and send any events.

Thus, our function resolved its promise before analytics events could hit the wire, and the process was killed before the events could be sent.

This timeout behavior was not documented anywhere, so it was very surprising to find out that the flush call wasn't working; we had no idea we'd need to set a sane timeout ourselves (await analytics.closeAndFlush({ timeout: 10000 })).

silesky commented 1 month ago

We followed the guidance found in the readme that suggests setting flushInterval to 1, but found some events were being dropped, particularly if fired shortly before the serverless function finished and resolved.

Hey @daguej a lot to unpack here, but the README suggests that you set flushAt to 1 (which is the number of events that will be batched), not flushInterval.

flushInterval shouldn't come into play if flushAt is 1, since the event queue will always be flushed as soon as it receives an event (and I agree, that it definitely shouldn't be set to something super low)

daguej commented 1 month ago

Indeed it does! Looks like we misread the property name in the readme, and accidentally set flushInterval instead of flushAt. (I double-checked our codebase; it's definitely setting the former.). Oops, sorry for the confusion.

Mistaken property aside, the core of this issue still stands -- setting a low flushInterval causes flushAndClose to effectively not work.

(FWIW, my opinion about the readme's recommendation remains the same, since either way you're blocking on analytics calls unnecessarily.)