grafana / xk6-browser

k6 extension that adds support for browser automation and end-to-end web testing via the Chrome Devtools Protocol
https://grafana.com/docs/k6/latest/javascript-api/k6-experimental/browser/
GNU Affero General Public License v3.0
337 stars 42 forks source link

URL Grouping/Aggregation #371

Open tom-miseur opened 2 years ago

tom-miseur commented 2 years ago

It is often useful to aggregate endpoint URLs that contain dynamic values. This is critical in the k6 Cloud due to the limits we have in place to prevent tests from emitting too-many-metrics/too-many-urls.

The URL Grouping documentation provides a solution for k6 scripts using the http module, but because xk6-browser operates at the browser-level, there is no opportunity for the user to apply the name tag to requests that require it.

The situation is compounded by the fact that xk6-browser gains visibility of all HTTP requests incurred by the browser, including 3rd party hosts that would not normally be interacted with at all using HTTP k6 scripts.

Potential solutions

Allowlist/blocklist hosts in xk6-browser

A cursory browse through Playwright docs suggests there is no convenient way of preventing/allowing requests to certain hosts, e.g. through specifying regular expressions. There is, however, a request interception mechanism involving Page.route or BrowserContext.route that could be used to abort requests that don't fit the criteria.

Pros:

Cons:

Allowlist/blocklist hosts after-the-fact

This means xk6-browser still sends requests to the additional hosts, but that traffic can be filtered out of results.

Pros:

Cons:

Aggregation Rules

This would involve the user specifying URL grouping regular expressions (likely in options) ahead of time. Before any metric is generated, we check if the URL matches any of the patterns and apply the transformation as necessary.

Example:

export const options = {
  aggregations: [
    { regex: 'http:\/\/ecommerce\.test\.k6\.io\/checkout\/order-received\/.*\/\?key=.*', replace: '[id]' }
  ]
}

// http://ecommerce.test.k6.io/checkout/order-received/124/?key=bgravga43g43 -> http://ecommerce.test.k6.io/checkout/order-received/[id]/?key=[id]

Pros:

Cons:

imiric commented 2 years ago

As mentioned over Slack, support for k6's blockHostnames option was added in #204, and released in v0.2.0. So you can give that a try right now and see if it helps.

That said, we'll still have to implement URL grouping by name, since that's currently not possible.

Using regex for this would be the more flexible option, but sticking with globbing patterns like with blockHostnames would be user friendlier. Considering this feature would also be useful for plain k6 scripts, where evaluating a regex for each URL might be too CPU intensive, using globbing would also perform better. Performance in this case isn't as important for xk6-browser, since we don't make requests with nearly the same frequency, so regex might work for us as well, but globbing seems like the way to go.

If we want to use the global options object, this will have to be implemented in k6 instead, since extensions don't have access to change it. It's worth discussing this with k6 devs, so @na--, WDYT? Would this feature also be useful for k6? If so, we should implement it there first, and then reuse the option in xk6-browser, in the same way we did for blockHostnames. If not, then this will have to be an xk6-browser-specific option, likely part of the BrowserContext options.

na-- commented 2 years ago

Hmm, I don't have a very strong opinion here, but I'd prefer if we can avoid doing this via a new global option, at least until we have a clear idea of how to implement that optimally... :thinking:

Global options are always a heavy maintenance burden over time and they are often not flexible enough to address all use cases. In some cases they are unavoidable, but in general I think we've found that programmable APIs are both easier to maintain and more flexible.

In this case, maybe a new callback to the browser.newContext() parameters could be used? I am not familiar enough with xk6-browser to know if this is a good or even possible solution, just throwing it out there as a potential solution through the API instead of through the global config

dgzlopes commented 1 year ago

Sorry! I somehow missed responding to this one :disappointed:

I thought it could be interesting to have an automatic way of doing this. After all, we have the metrics data and all the URLs in k6! (at least for some time).

Maybe we could have the option to aggregate "high cardinality data" that would check the latest URLs and remove the highly changing part (and replace it with id_X or something).

There is a "similar" feature in Grafana that lets you dedup Loki logs based on the signature.

dgzlopes commented 1 year ago

Internally, if I remember correctly, we had something similar for Prometheus metrics labels, too (In Python).