Closed yunapotamus closed 2 years ago
super-nit: can we prefix these strings so we know it's not just some python special thing? :)
__client_null__
__client_empty__
__client_unset__
i need to think through a bit if these would be acceptable to define an inference scope or not for inf. ref.
Updated proposal is to create an enum to indicate the state of the ID when logged on the client.
// Indicates the value of the ID when logged from metrics client library.
enum ClientProvenance {
UNKNOWN = 0;
// Set to null by client code.
NULL;
// Set to the empty string by client code.
EMPTY;
// Not set by client code, and also not autogenerated by metrics library.
UNSET;
// Automatically generated by metrics library.
AUTOGENERATED;
// Set to a specific value by client code.
CLIENT_SPECIFIED;
}
message Impression {
string impression_id = a;
ClientProvenance impression_id_provenance = x;
string content_id = b;
ClientProvenance content_id_provenance = y;
string insertion_id = c;
ClientProvenance insertion_id_provenance = z;
}
This looks great thank you
null/empty seems not like the others. it's client specified to these special values. maybe CLIENT_NULL
and CLIENT_EMPTY
?
wrt unset, when do ids get autogenerated vs remain unset in sdks? and does autogenerated cover just sdks, or does it also include event api id population as well?
Do you think this enum will be useful for later steps in the pipeline? It's intended for client library only (hence NULL
instead of CLIENT_NULL
since Client
is in the enum name :-) ) but if it would be useful to others we can extend it.
Autogenerated gets propagated to later values, I'll add an example.
ie. If logView
autogenerates view_id = abcd-1234
, and logImpression
uses that autogenerated value view_id = abcd-1234
, then Impression.view_id_provenance = AUTOGENERATED
.
Overall, looks great.
Are you planning on using ClientProvenance
for non-ID cases? Otherwise, can we call this IdProvenance? I'm okay with the word Provenance for this. It'd be great to have a shorter name but I couldn't think of a better one.
I think we'll use this mostly for debugging. I don't think we'll use these provenance in most parts of metrics and the consumers of metrics. I wouldn't be shocked if we eventually strip this out of our flat records.
I wasn't planning on using ClientProvenance
for anything else. I'm also fine with IdProvenance
. I'll send out a schema PR soon.
Yeah, this seems like a debugging tool for me. I'll hook it up to the diagnostics
settings for client config.
We want to figure out where we are dropping IDs. Proposal is to create an enum to indicate the state of the ID when logged on the client.
Then in Event protos:
For example:
This behavior would be controlled by a client config field, say
diagnosticsIDsIncludeProvenance
. The default behavior when this field is false is the previous behavior, which does not include provenance.