kedro-org / vscode-kedro

Kedro extension for VSCode including LSP and other features
https://marketplace.visualstudio.com/items?itemName=kedro.Kedro
Apache License 2.0
18 stars 2 forks source link

Track command usage with telemetry #68

Closed noklam closed 1 month ago

noklam commented 2 months ago

We have download stats but we need a bit more details to help making decision for feature development.

We will introduce kedro viz command within the extension, we can add telemetry to get some insight about the usage. From my understanding, viz itself tracks the UI click but it wouldn't be sufficient to tell if it's from VSCode.

Questions

Do we implement the telemetry in TS or Python? This may depends on what do we want to track.

Python

Pro:

Con:

TS

Pro:

Con:

jitu5 commented 2 months ago

@noklam In term of Kedro-Viz right now we have only two actions.

  1. To open kedro Viz flowchart from command palette. And this action involves running a LSP command kedro.getProjectData.
  2. When user clicks on dataset node on flowchart, from extension side we are running kedro.goToDefinitionFromFlowchart LSP command to open relevant file.

So we can go with Python and use kedro-telemetry.

noklam commented 2 months ago

For now, I decide to first understand how kedro-viz track telemetry and how the consent flow work.

I realise kedro-telemetry is not a complete solution since by default the extension will not execute the hook (ofcourse we can manually trigger it). We can borrow the consent logic from kedro-telemetry by vendoring the library.

For the actual telemetry tracking, I prefer to trigger this in TS since we can track the usage of command directly (compare to tracking it from requests to language server indirectly), this is also much easier to extend in the future.

In the meantime, I have some conversation with @ravi-kumar-pilla to research on viz's telemetry. I aim to kick start this work this week so we can finished it before the release (1st or 2nd week of September).

ravi-kumar-pilla commented 2 months ago

In the meantime, I have some conversation with @ravi-kumar-pilla to research on viz's telemetry. I aim to kick start this work this week so we can finished it before the release (1st or 2nd week of September).

Some information on how Telemetry works internally in Viz -

  1. FE: Kedro-Viz builds a telemetry.html by default (when you do make build, webpack does not touch telemetry.html as it is in public folder. More info - https://create-react-app.dev/docs/using-the-public-folder/#adding-assets-outside-of-the-module-system)
  2. BE: The FAST API app (apps.py file -> create_api_app_from_project) has registered a root GET request (I.e., initial doc request at home page and ET).
  3. The above request does the following - If consent is true (at .telemetry of Kedro project) and kedro_telemetry.plugin is available in the env, it gets the heap_app_id (in-case of dev from env HEAP_APPID_DEV or HEAP_APPID_PROD which is hardcoded in the plugin for PROD) and heap_user_identity from kedro-telemetry plugin
  4. Once we get the heap_app_id and heap_user_identity, we inject them into telemetry.html and then append the telemetry.html file into the section of index.html file.
  5. This html content is then served at root (@app.get(“/“), @app.get("/experiment-tracking”))

@noklam lets connect to discuss if this is not clear. Thank you

noklam commented 2 months ago

If I understand it correctly, telemetry.html is only appended to the UI if heap_app_id and heap_user_identity is not null (this is, consented). This is, btw outdated with how kedro-telemetry works today as viz re-implement some logic from kedor-telemetry, in the latest kedro-telemetry, we introduced a user ID that stored in a different place. (This maybe something that viz need to look into Cc @DimedS ).

So the main "consent" flow is still done in Python, via mimic kedro-telemetry

    @app.get("/")
    @app.get("/experiment-tracking")
    async def index():
        heap_app_id = kedro_telemetry.get_heap_app_id(project_path)
        heap_user_identity = kedro_telemetry.get_heap_identity()

How should I understand this code? Does that mean only the index page will check for consent but not the others?

ravi-kumar-pilla commented 2 months ago

How should I understand this code? Does that mean only the index page will check for consent but not the others?

This includes everything on viz. All other routes are subpaths

DimedS commented 2 months ago

after finishing framework telemetry opt-out, I created a ticket about consent check and UUID update in viz, some discussion is also there: https://github.com/kedro-org/kedro-viz/issues/2020

noklam commented 2 months ago

Thanks a lot @ravi-kumar-pilla ! Now I have the full picture of how telemetry would work in the extension. So Heap has 3 ways to track information.

  1. via the webpage directly (that is how Viz is tracking stuff currently)
  2. the other way is using a server API (kedro-telemetry), which is essentially a POST request.
  3. Client API (Js/TS)

AFAIK, the flowchart currently will not collect telemetry, because it's not served via the kedro-viz FastAPI route (see comments above). The most important thing to track now is the run of command "Kedro: Run Kedro Viz", to give us some sense how many people are trying to launch that view.

To achieve that, we need to implement this in two place:

  1. Use kedro-telemetry as a consent flow and get the user id, this information need to return to the extension later.
  2. Extension (client), need to check the consent whenever a command is triggered. If consent is given, send an custom event to HEAP.
  3. (Not needed for now, only need when we want to track the flowchart clicks) - the KedroViz reach component need a new prop, and take the consent & user id information.
    • Inject the telemetry.html into webview in similar way if consent is given.
    • need a custom way to handle slicing since the metrics are implemented separately Cc @Huongg

Cc @jitu5