jaredpalmer / tsdx

Zero-config CLI for TypeScript package development
https://tsdx.io
MIT License
11.25k stars 508 forks source link

Opt-in tsdx analytics / telemetry #392

Closed jaredpalmer closed 4 years ago

jaredpalmer commented 4 years ago

I want to understand usage in more depth and want to add the option to opt-in analytics / telemetry in the cli when folks bootstrap a new project.

Questions I want answers to:

No I don’t want to read anyone’s code, just get some anonymous metadata that can inform us and everyone about directional usage trends. I will open source all aspects of the tracking code and backend.

Proposed Infrastructure Since it’s probably too controversial to use google analytics, we will roll it on our own. Luckily, this isn’t as bad as it sounds. Inside of the codebase, we just make a lil analytics client that sends simple GET requests with data as a query string to an AWS Cloudfront distro. We can then use S3 and Athena to transform the logs into usable metrics. We can the. embed stats on the future tsdx website I’m never going to finish 😉. We can also potentially use cube.js to make generating the react charts even easier.

Some possible architectures:

Or a simpler one:

(If we wanted to be cool AF, v2 could give folks the ability to see their own stats (wouldn’t be anonymous though))

Alternatives

Discussion

jaredpalmer commented 4 years ago

Blog posts

jaredpalmer commented 4 years ago

Got Athena working without a single line of code. Kinda cool

agilgur5 commented 4 years ago

Some comments:

  1. To better inform decisions, it's certainly a good idea. And I think opt-in is the right way to go and open-sourcing would be a must. There are some caveats with this though:
    1. Opt-in inherently biases the data to reflect power users / supporters / contributors. That's likely to not be representative. It would still be a data point, but taken with a grain of salt.
    2. Data will also be skewed by the largest users. This can be controlled for in analysis, but may require pseudo-anonimity (e.g. hashed IDs or GUIDs). Pseudo-anonimity could be used to let folks find themselves (have the CLI be able to display your hashed IDs)
  2. To make the data public, can make the bucket read permissions open to everyone (but restrict write to the events pipeline). You would be charged for public read/download events however. AWS also has a Public Dataset Program and GCP has Public Datasets as well (I'm sure Azure does too, but I haven't used Azure). A notable example is the PyPI download stats dataset on GCP/BigQuery
  3. There is a project that does something similar, albeit for a different purpose (OSS financing): scarf
  4. Would definitely recommend pulumi for Infrastructure-as-Code, particularly as it has first-class TypeScript support (and more generally is an IaC provider that uses actual programming languages instead of config files). I could volunteer some time to create the Pulumi configuration for a set-up etc (I'm also an infrastructure engineer).
  5. Can also look at various existing OSS solutions for different pieces of this. E.g. Metabase, Superset, Franchise are some popular tools for analysis + visualization that all have different complexity / features / integrations.
  6. As with all self-hosted set-ups, the real costs are in maintenance (including paging rotations), not so much in initial config. The simpler and the more managed services, the less this is a problem, but nonetheless a big consideration.
jaredpalmer commented 4 years ago

Update: I have a working proof of concept with Cloudfront logging to an s3 bucket plus Athena. Not even sure QuickSight is worth it since I can just download the Athena query by csv.

This is the dopest analytics client of all time. All I'm doing is pinging my cloudfront distro

const fetch = require("isomorphic-unfetch");
const qs = require("qs");

const track = data =>
  fetch(
    "https://XXXXXX.XXXX.com?" +
      qs.stringify({
        tsdx_version: "v0.11.0",
        ts_version: "v3.3.3",
        node_version: "v10.6.3",
        ...data
      })
  )
    .then(() => console.log("done"))
    .catch(console.log);

// call it
track({ event: 'boop' });

and then in Athena....

Screenshot 2019-12-25 08 32 32