o19s / ubi

User Behavior Insights standard schema
Apache License 2.0
12 stars 4 forks source link

RFC/PROPOSAL: add user_id to event.schema #15

Closed AnkitSiva closed 2 weeks ago

AnkitSiva commented 1 month ago

What/Why

What are you proposing?

We propose that the event schema contain a dedicated user ID field so that consumers of UBI data can disambiguate between client_id and user_id and better standardize what we recommend the integrators track.

What users have asked for this feature?

We have spoken to data analysts who work on analyzing user behavior and they mentioned that the terms client_id, session_id and user_id have distinct meanings that cannot be merged.

What problems are you trying to solve?

The client_id in analytics parlance usually refers to a hash of the browser and its version. This would mean that if multiple unauthenticated users were using the same browser version, their activity would fall under the same id. The current approach has another caveat: if a user ID is logged in the client_id field, then the behaviors of unauthenticated users won't be logged. With this proposal, there won't be such a confusion any more. In case of an unauthenticated user, the user_id can remain empty.

Are there any security considerations?

No additional security impact as the existing recommendation was to already track the user ID under the client_id field

Are there any breaking changes to the API

No

What is the user experience going to be?

Customer can configure and analyze the user-behaviors along an additional axis that is well-separated from client_id.

Are there breaking changes to the User Experience?

No

Why should it be built? Any reason not to?

This will allow the separate customer personas (front-end developer and behavior analyst) to be able to perform their tasks better with less coordination required as it reduces ambiguity around what attributes are tracked in which fields.

What will it take to execute?

  1. Merging this pull request
  2. Documentation and samples updates.

Any remaining open questions?

No.

miike commented 1 month ago

Makes sense - I'd consider making this nullable explicitly i.e.,

type": ["string", "null"]
dtaivpp commented 1 month ago

I think this makes sense as well. Users would probably outlive several sessions and I imagine correlating data across disparate sessions would be troublesome otherwise.

epugh commented 4 weeks ago

@AnkitSiva would you be willing to update the PR for the null concept. I did a bit of googling and yeah, you need to explicitly have the data type "null": https://turbo360.com/blog/specifying-json-schema-elements-null-in-logic-apps

epugh commented 4 weeks ago

For a moment I thought, hey, if this isn't in the required list, then that would mean it is null. However, I can imagine that you might have user_id explicitly set to null, versus an abscense of the attribute meaning null.. So yeah, let's add the null data type, and not add it to the required list.

AnkitSiva commented 4 weeks ago

@epugh I'm not sure if I understand what the rationale behind making the user_id nullable is if it's not required? Is it common for said value to be explicitly null?

epugh commented 2 weeks ago

@AnkitSiva I'm thinking that the common pattern for folks using this would be to say "I am using user_id", and hten, for all the places where user_id is null, that they would set it explicitly null, versus skipping the attribute...

How about, for now, we just make the change as you have it, and look to dig in more on @miike suggestion...

epugh commented 2 weeks ago

@dtaivpp @miike if the nullable thing is something we want to pursue, let's create a fresh PR for that. I am somewhat under the gun to get the 1.1 release out the door in August... and it's the 31st of August....