snowplow / snowplow-javascript-tracker

Snowplow event tracker for client-side and server-side JavaScript. Add analytics to your websites, web apps and servers.
http://snowplowanalytics.com
BSD 3-Clause "New" or "Revised" License
555 stars 222 forks source link

Events sent to different collectors have different event_ids #293

Open fblundun opened 10 years ago

fblundun commented 10 years ago

If you have multiple collectors active and tell all of them to send an event, that event is actually generated separately for each tracker instance, so the resulting events will have different eids.

This is because the substance of the events might be different (they will have different tnas and may differ in other ways, for example in whether they are base 64 encoded).

But will it cause problems later on?

alexanderdean commented 10 years ago

It's a good question!

yalisassoon commented 10 years ago

It would make it harder to reconcile data between different trackers which is an interesting exercise.

On Wed, Nov 26, 2014 at 2:21 PM, Alexander Dean notifications@github.com wrote:

It's a good question!

— Reply to this email directly or view it on GitHub https://github.com/snowplow/snowplow-javascript-tracker/issues/293#issuecomment-64651687 .

Co-founder Snowplow Analytics http://snowplowanalytics.com/ The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom +44 (0)203 589 6116 +44 7841 954 117 @yalisassoon https://twitter.com/yalisassoon https://twitter.com/yalisassoon

alexanderdean commented 9 years ago

@fblundun - how hard would it be to make it so that the same event to different trackers had the same UUID, or to make it configurable (because you would want different event IDs if you are planning on unifying - vs reconciling - the event streams later)...

fblundun commented 9 years ago

This would be quite a big architectural change. Currently, multiple tracker instances are configured separately and track events separately.

This change would make it possible for two events with different contents to have the same event_id, for example in the following situation:

Suppose I have one tracker instance sending GETs to the Cloudfront Collector and another sending POSTs to the Clojure Collector. I have enabled the performanceTiming and gaCookies contexts only for the latter because I am worried about the length of the querystring.

So each tracker instance sends a page view, but only one has these contexts attached.

Is this acceptable?

alexanderdean commented 9 years ago

Hmm, interesting point...

fblundun commented 9 years ago

We could add a new field called e.g. api_call_id which be the same for when different tracker instances send their own versions of the same event.

alexanderdean commented 9 years ago

Hmm. I'm coming round to the idea that there is one event, which has one event_id, and the tna and any contexts such as cookies are just "early enrichments' which are attached to the event before it even leaves the tracker. From this, it's Two Collectors, One EventID.

Think of it this way: if there is an ecommerce transaction where Joe buys a pair of Nikes, that's an event. The fact that somebody wants to send it to two separate unified logs for audit purposes doesn't suddenly make it two events - it's just an early routing fork of the same event. So there should be one event_id. If both collectors somehow end up merging into the same unified log downstream, then I think the "shared" event_id is a feature not a bug, as our deduplication engine (which doesn't exist yet) should be able to kick in and grandfather one of the two copies.

paulboocock commented 3 years ago

This is still a significant change. Trackers are currently very unique from one another, we could perhaps improve the "SharedState" between them to solve for this but v3 hasn't made this change any easier. I think multiple trackers/emitters are quite rare so demand for this is low. It needs more consideration, might be a candidate for v4.

max-tgam commented 3 years ago

Actually in my expirence multiple trackers using Snowplow on the same site from different vendors could be quite common.

For a long while I had to modify our tracker and rename Snowplow's global namespace in our tracker's init.js to something unique as we had issuse with Keywee pixel that was using v 2.7.3 of Snowplow and was causing race conditions on several different sites.

paulboocock commented 3 years ago

I think some use cases certainly call for it more than others, particular Ad Tech being one that stands out. This issue is more about the same event going to multiple emitters, which I think is rare - multiple trackers on a page is certainly less rare. I'm hoping to make this easier with https://github.com/snowplow/snowplow-javascript-tracker/issues/161