sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Add more info to downloadable usage statistics #27854

Open jasonhawkharris opened 3 years ago

jasonhawkharris commented 3 years ago

Feature request description

There is a "download usage stats button" on this page.

Screen Shot 2021-11-17 at 1 56 20 PM

An excellent feature. However, upon downloading the archive, one finds that it contains two files (UsersDates.csv and UsersUsageCounts.csv), rendered useless by the fact that they do not properly label users, and provide only the bare minimum of data. Screen Shot 2021-11-17 at 1 59 27 PM

Screen Shot 2021-11-17 at 1 59 55 PM

The user IDs shown here do not correspond to the userIDs returned from a GraphQL query, which are base64 encoded.

Is your feature request related to a problem? If so, please describe.

With the limited amount of data in the downloadable archive, it is almost impossible to know what user ID corresponds to what User, which makes the feature useless. Currently, you have to take a user id from the CSV (1), then encode it using base64 (User:1 = VXNlcjox), then list out all users in a graphQL query and search for the matching id.

Multiple customers have complained that the user IDs provided in the downloaded CSV's do not align with the base64 encoded IDs received when making GraphQL queries. Even if a customer did go through the trouble of encoding the information in the CSV ('User:1') to base64, they still can't query a specific user's data in the API console, because there currently isn't a way of querying a user's info with an ID at all.

Describe alternatives you've considered.

Using the base64 encoded values in the CSV files and allowing customers to query user data by using an ID would solve this issue.

The workaround I have regularly provided to customers is not ideal, because it still does not allow for querying an individual user with an ID. As a result, I've instructed users to avoid using the downloadable statistics altogether, and instead, use the following graphql query in the API console on their sourcegraph instance:


  users(activePeriod: THIS_MONTH, first:2) {
    totalCount
    nodes {
      username,
      id,
      usageStatistics {
        lastActiveTime,
        searchQueries,
        pageViews,
        codeIntelligenceActions,
        findReferencesActions,
        lastActiveCodeHostIntegrationTime
      }
    }
  }
}

This in turn, returns the some output like this:

```{
  "data": {
    "users": {
      "totalCount": 14,
      "nodes": [
        {
          "username": "<username>",
          "id": "VXNlcjox",
          "usageStatistics": {
            "lastActiveTime": "2021-11-12T17:59:35Z",
            "searchQueries": 460,
            "pageViews": 2786,
            "codeIntelligenceActions": 384,
            "findReferencesActions": 52,
            "lastActiveCodeHostIntegrationTime": "2021-11-03T22:52:54Z"
          }
        },
        {
          "username": "<username>",
          "id": "VXNlcjoy",
          "usageStatistics": {
            "lastActiveTime": "2021-11-10T16:35:50Z",
            "searchQueries": 31,
            "pageViews": 296,
            "codeIntelligenceActions": 2,
            "findReferencesActions": 0,
            "lastActiveCodeHostIntegrationTime": null
          }
        }
      ]
    }
  }
}

... which returns more data than the CSV downloads,  including all the identifiable data needed to determine which data belongs to which user.

#### Additional context

<!-- Add any other context or other information you'd like to provide. -->
emchap commented 2 years ago

@jasonhawkharris I know there had been some discussion around tackling this during a hack day—is that on the table still, or has this been deprioritized?