o19s / ubi

User Behavior Insights standard schema
Apache License 2.0
12 stars 4 forks source link

RFC/PROPOSAL: add session_id to event.schema #17

Closed AnkitSiva closed 2 weeks ago

AnkitSiva commented 1 month ago

What/Why

What are you proposing?

We propose that the event schema contain a dedicated session ID field so that consumers of UBI data can track what users (authenticated and anonymous) are doing on dedicated visits. client_id does not uniquely identify a user (since two users with the same browser version would point to the same client_id) and the user_id proposed in this PR will not always be available.

What users have asked for this feature?

We have spoken to data analysts who work on analyzing user behavior.

What problems are you trying to solve?

The analysts we spoke to mentioned that a session_id is an important field in addition to client_id and user_id to understand shifts in the user behavior over time. They also mentioned how they often need to correlate search-interactions tied to a session against the other interactions that would also be tied to a session.

Are there any security considerations?

No additional security impact as the existing recommendation was to already track the user ID under the client_id field. This is of a similar impact.

Are there any breaking changes to the API

Yes, the session_id is proposed as a required field.

What is the user experience going to be?

Customer can configure and analyze the user-behaviors along an additional axis that is well-separated from client_id and user_id.

Are there breaking changes to the User Experience?

No

Why should it be built? Any reason not to?

This will allow the separate customer personas (front-end developer and behavior analyst) to be able to perform their tasks better with less coordination required as it reduces ambiguity around what attributes are tracked in which fields.

What will it take to execute?

  1. Merging this pull request
  2. Documentation and samples updates.

Any remaining open questions?

No.

miike commented 1 month ago

Looks good. I don't know if you want to consider adding session index (the number of sessions a user has had) in there as well or if that's overloading as customisation for new users (index=1) is often quite different for returning users.

epugh commented 4 weeks ago

I suspect the richness of having multiple sessions is worth it's own PR, and I think we need to see what the appetite is for that. One thought related is do we need to think about, in the future, extensions to the spec for those who are doing more advanced things in certain areas...

AnkitSiva commented 4 weeks ago

For now, validating presence of multiple sessions can be computed through a query post-hoc given userID, sessionID and timestamps. I also think that session_index is probably a server-side metric as opposed to a client-side event metric.

ydrozd commented 3 weeks ago

In our practical implementation of sessionization we found out that session definition is not necessarily stable across various applications. Maintaining certain types of session in a stream of events, on the other hand, is problematic because session identification may not be correctly resolved due to possibility of delayed events. One solution is to postpone session identification to downstream processing stages.

epugh commented 2 weeks ago

Session has been a common request... It's a widely used concept in many tracking solutions.

I think that for "simple" use cases it's a powerful tool, and then the richer and more complex you get, the more session because a knotty problem to deal with, as suggested by @ydrozd in his comment!

I'm going to merge it for the 1.1, however I look forward to digging into this topic more. It may end up being one of those things where you caveat the heck out of the use of it?