Open fabriziomoscon opened 9 years ago
Thanks for this request!
We may want to consider accepting integer (and maybe all scalar?) distinct_ids (or at least coerce them to strings) but it's true that an object or list distinct id is probably always an error.
Also, cc @tdumitrescu
AFAIK we do allow object and list distinct IDs in other server-side and client-side tracking libs, and they end up getting serialized in a deterministic format by the backend. mixpanel-node
should probably allow the same behavior. Since metrics.send_request()
simply JSONifies all the data, that seems to be the intent here as well. Will have to investigate why this would end up creating multiple users. @fabriziomoscon if you can pass along a simple code snippet to reproduce the issue that would be helpful.
Further researches shown me that the users created are distinct but have a meaningless distinct_id:
// after url decoded
#user?distinct_id={u'_commission': true, u'_locale': u'en_US', u'_promocode_enabled': True, u'_subcategories': [], u'_travel_mode': u'TRANSIT', u'areas': [], u'avatar': u'https://graph.facebook.com/<HIDDEN>/picture?type=large&width=400', u'braintreeCustomerId': u'5
It doesn't seem useful - unless I am missing something - to use a type distinct_id
other than string, I can partially see an argument to allow numbers, but definitely not objects. Event if an object is serialized by JSON.stringify()
that is clearly a developer mistake which the library should help prevent with an Error
.
You can easily reproduce this using something similar to the following script:
var mixpanel = require('mixpanel');
// initialize mixpanel library
// ..
var user = new User('fabrizio', 'moscon');
mixpanel.people.increment(user, 'arbitrary_count', 1);
this creates an anonymous user (since that distinct_id has no other property) and the distinct_id is the stringified object.
I will not make this mistake in the future ( hopefully =) ) as I have learn this trick, but for new user this might cause unexpected behavior and possibly wasting useful time debugging.
The main obstacle to implementing this suggestion is that we do have customers who legitimately send objects and other non-string/int types as distinct IDs. Although it's not the standard use case, it's one that we do support and it simply doesn't make sense for the Node lib to exhibit different behavior than our other SDKs in that regard.
If, however, you're seeing behavior where sending the exact same JSON is resulting in different distinct_id
s being registered, that's an issue which we'll need to fix (probably on the backend). I'm guessing that your 'user' object is changing internally between your different tracking calls, which is what's leading to multiple distinct_id
s.
Javascript has a "particular" way of handling types, so some basic type checking must be put in place.
If you could provide a case for using an object as distinct_id
that would definitely help the discussion. If that is legacy behavior and it should not be encouraged anymore, enforcing string/int by code and a major version update number will suffice, otherwise at least some basic object validation rules should be applied
In case using an object is still considering a legitimate case the library should at least still check for:
JSON.stringify()
result for example)for example
// use a node.js module to validate
// https://github.com/hapijs/joi or
// https://github.com/philbooth/check-types.js or
// some basic type check
if (
typeof distinct_id == "undefined" ||
distinct_id === null ||
typeof distinct_id == "function" ||
typeof distinct_id == "boolean" ||
Array.isArray(distinct_id) ||
) {
return callback(new TypeError('Invalid distinct_id type'));
}
if (typeof distinct_id == 'Object') {
// check other constrains
}
Validation in mixpanel.people
calls to eliminate falsey values and pure functions (if that's even possible) as distinct_id
sounds great! We'd welcome a pull request with it. Arrays and arbitrary objects need to be allowed. This isn't about 'legacy' behavior, it's behavior which some customers currently use (e.g., distinct_id = {name: 'Fabrizio', email: 'fab@example.com'}
, much like a multi-column primary key in SQL), and we just can't deviate from it for this one lib, or tell them that the Node lib won't work for their projects because it artificially restricts distinct_ids.
the type of
distinct_id
at this point: https://github.com/mixpanel/mixpanel-node/blob/master/lib/mixpanel-node.js#L296 should be checked like so:I just discovered I created 1000 users because I was passing an object as
distinct_id
rather than string I would say that the same issue is also present in any function that expects a distinct_id as parameter.I can write a pull request when I can see the other pull requests I sent are merged.
cc @joeatwork cc @carlsverre