willkg / crashstats-tools

Command line tools and library for interacting with Crash Stats (https://crash-stats.mozilla.org/)
Mozilla Public License 2.0
7 stars 0 forks source link

support nested aggregations #103

Closed willkg closed 4 months ago

willkg commented 11 months ago

supersearchfacet should support nested aggregations.

https://crash-stats.mozilla.org/documentation/supersearch/#nested-aggregations

$ supersearchfacet --_aggs.product=platform

In order to show that in a 2d table, I think it should flatten the dimensions into keys.

$ supersearchfacet --_aggs.product=platform --format=raw
{
  "hits": [],
  "total": 837148,
  "facets": {
    "product": [
      {
        "term": "Firefox",
        "count": 379327,
        "facets": {
          "platform": [
            {
              "term": "Windows NT",
              "count": 308508
            },
            {
              "term": "Linux",
              "count": 34563
            },
            {
              "term": "Mac OS X",
              "count": 22275
            },
            {
              "term": "Unknown",
              "count": 13981
            }
          ]
        }
      },
      {
        "term": "Fenix",
        "count": 345167,
        "facets": {
          "platform": [
            {
              "term": "Android",
              "count": 217868
            },
            {
              "term": "Unknown",
              "count": 127299
            }
          ]
        }
      },
      {
        "term": "Thunderbird",
        "count": 106367,
        "facets": {
          "platform": [
            {
              "term": "Windows NT",
              "count": 95204
            },
            {
              "term": "Mac OS X",
              "count": 4343
            },
            {
              "term": "Unknown",
              "count": 3655
            },
            {
              "term": "Linux",
              "count": 3165
            }
          ]
        }
      },
      {
        "term": "Focus",
        "count": 6283,
        "facets": {
          "platform": [
            {
              "term": "Android",
              "count": 4771
            },
            {
              "term": "Unknown",
              "count": 1512
            }
          ]
        }
      },
      {
        "term": "MozillaVPN",
        "count": 3,
        "facets": {
          "platform": [
            {
              "term": "Windows NT",
              "count": 3
            }
          ]
        }
      },
      {
        "term": "ReferenceBrowser",
        "count": 1,
        "facets": {
          "platform": [
            {
              "term": "Unknown",
              "count": 1
            }
          ]
        }
      }
    ]
  },
  "errors": []
}

Would yield:

product / platform | count
--- | ---
Firefox / Windows NT | 308508
Firefox / Linux | 34563
Firefox / Mac OS X | 22275
Firefox / Unknown | 13981
...

I'd like to also make sure this works because it solves some of our current problems:

$ supersearchfacet --_aggs.product.release_channel=build_id
willkg commented 5 months ago

I've been working on this on and off since October 2023. Part of the complexity here is that "nested aggregations" is under-documented for Crash Stats. Part of the problem is that combinations of aggs, cardinality, and histograms is very under-documented. And the last part of the problem is that some queries which look fine don't work for unclear reasons.

For example, this doesn't return any data or errors:

$ supersearchfacet --_aggs.product.release_channel=build_id

I'm pretty sure that's a bug in Socorro, but I'm not sure, yet. I'll write up a bug in Socorro for this and another thing I bumped into while working on this tomorrow.