Internally support $insert_id

mixpanel / mixpanel-node

A node.js API for mixpanel

http://www.mixpanel.com

MIT License

477 stars 159 forks source link

Internally support $insert_id #163

Open Suhail opened 4 years ago

Suhail commented 4 years ago

It'd be helpful if the API handled setting or generating an insert id for you automatically like so:

private generate_insert_id() {
    const length = 16;
    let result = '';
    const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
    const char_len = characters.length;
    for (let i = 0; i < length; i += 1) {
      result += characters.charAt(Math.floor(Math.random() * char_len));
    }

    return result;
  }

This is useful because it would allow users who may make mistakes or see accidental data inaccuracies to easily get them fixed since otherwise there would be no other recourse until you realized to do this.

If Mixpanel did this, then it would also help reduce possible integration mistakes if customers generated this incorrectly or did so without enough entropy.

tdumitrescu commented 4 years ago

If you don't include an $insert_id explicitly, the API server is supposed to generate one internally ("Mixpanel generates the $insert_id at the API for payloads that do not contain $insert_id."), though I don't think that value is exposed to users. Since this library doesn't implement any retry logic internally, it's not clear what the benefit would be of setting it here. Are you talking about using the event deduplication feature as a way to correct historical data?

Suhail commented 4 years ago

I don't see any $insert_id if I were to go the product and breakdown by Insert ID. If the API automatically does, that's great -- I just don't see the benefit for some reason.

The best solution seems like Mixpanel handles it on the backend, it's exposed in the UI, and libraries don't need to implement it. I found that I had to implement since I couldn't segment by Insert ID for whatever reason.

See here:

tdumitrescu commented 4 years ago

I guess the QBQ is what you're trying to do with $insert_id in your analyses. It's supposed to be for allowing Mixpanel's storage services to deduplicate data accidentally sent multiple times.

Suhail commented 4 years ago

We had an accident where we ended a session and it created some significant data inaccuracy. One way to remove that inaccuracy is to filter it out of the data via its $insert_id or to repair it by re-writing it somehow via its $insert_id is my understanding.

Does the UI ever expose the $insert_id even if I am sending it explicitly now?