mixpanel / mixpanel-node

A node.js API for mixpanel
http://www.mixpanel.com
MIT License
476 stars 159 forks source link

increment is creating anonymous users when distinct_id is not a string #50

Open fabriziomoscon opened 9 years ago

fabriziomoscon commented 9 years ago

the type of distinct_id at this point: https://github.com/mixpanel/mixpanel-node/blob/master/lib/mixpanel-node.js#L296 should be checked like so:

if (typeof distinct_id != 'string') {
  return callback(new TypeError('distinct_id must be a string'));
}

I just discovered I created 1000 users because I was passing an object as distinct_id rather than string I would say that the same issue is also present in any function that expects a distinct_id as parameter.

I can write a pull request when I can see the other pull requests I sent are merged.

cc @joeatwork cc @carlsverre

joeatwork commented 9 years ago

Thanks for this request!

We may want to consider accepting integer (and maybe all scalar?) distinct_ids (or at least coerce them to strings) but it's true that an object or list distinct id is probably always an error.

Also, cc @tdumitrescu

tdumitrescu commented 9 years ago

AFAIK we do allow object and list distinct IDs in other server-side and client-side tracking libs, and they end up getting serialized in a deterministic format by the backend. mixpanel-node should probably allow the same behavior. Since metrics.send_request() simply JSONifies all the data, that seems to be the intent here as well. Will have to investigate why this would end up creating multiple users. @fabriziomoscon if you can pass along a simple code snippet to reproduce the issue that would be helpful.

fabriziomoscon commented 9 years ago

Further researches shown me that the users created are distinct but have a meaningless distinct_id:

// after url decoded
#user?distinct_id={u'_commission': true, u'_locale': u'en_US', u'_promocode_enabled': True, u'_subcategories': [], u'_travel_mode': u'TRANSIT', u'areas': [], u'avatar': u'https://graph.facebook.com/<HIDDEN>/picture?type=large&width=400', u'braintreeCustomerId': u'5

It doesn't seem useful - unless I am missing something - to use a type distinct_id other than string, I can partially see an argument to allow numbers, but definitely not objects. Event if an object is serialized by JSON.stringify() that is clearly a developer mistake which the library should help prevent with an Error.

You can easily reproduce this using something similar to the following script:

var mixpanel = require('mixpanel');
// initialize mixpanel library
// ..
var user = new User('fabrizio', 'moscon');
mixpanel.people.increment(user, 'arbitrary_count', 1);

this creates an anonymous user (since that distinct_id has no other property) and the distinct_id is the stringified object.

I will not make this mistake in the future ( hopefully =) ) as I have learn this trick, but for new user this might cause unexpected behavior and possibly wasting useful time debugging.

tdumitrescu commented 9 years ago

The main obstacle to implementing this suggestion is that we do have customers who legitimately send objects and other non-string/int types as distinct IDs. Although it's not the standard use case, it's one that we do support and it simply doesn't make sense for the Node lib to exhibit different behavior than our other SDKs in that regard.

If, however, you're seeing behavior where sending the exact same JSON is resulting in different distinct_ids being registered, that's an issue which we'll need to fix (probably on the backend). I'm guessing that your 'user' object is changing internally between your different tracking calls, which is what's leading to multiple distinct_ids.

fabriziomoscon commented 9 years ago

Javascript has a "particular" way of handling types, so some basic type checking must be put in place.

If you could provide a case for using an object as distinct_id that would definitely help the discussion. If that is legacy behavior and it should not be encouraged anymore, enforcing string/int by code and a major version update number will suffice, otherwise at least some basic object validation rules should be applied

In case using an object is still considering a legitimate case the library should at least still check for:

for example


// use a node.js module to validate
// https://github.com/hapijs/joi or
// https://github.com/philbooth/check-types.js or
// some basic type check

if (
  typeof distinct_id == "undefined" ||
  distinct_id === null ||
  typeof distinct_id == "function" ||
  typeof distinct_id == "boolean" ||
  Array.isArray(distinct_id) ||
  ) {
  return callback(new TypeError('Invalid distinct_id type'));
}

if (typeof distinct_id == 'Object') {
  // check other constrains
}
tdumitrescu commented 9 years ago

Validation in mixpanel.people calls to eliminate falsey values and pure functions (if that's even possible) as distinct_id sounds great! We'd welcome a pull request with it. Arrays and arbitrary objects need to be allowed. This isn't about 'legacy' behavior, it's behavior which some customers currently use (e.g., distinct_id = {name: 'Fabrizio', email: 'fab@example.com'}, much like a multi-column primary key in SQL), and we just can't deviate from it for this one lib, or tell them that the Node lib won't work for their projects because it artificially restricts distinct_ids.