mnot / squidpeek

Per-URL Squid logfile statistics and sparklines
9 stars 1 forks source link

Support URL rewrites #1

Open plambert opened 12 years ago

plambert commented 12 years ago

ideally, a config file with regular expression substitution-based rewrites would be awesome.

For example, a regex could replace UUIDs in URL paths with a placeholder like "[[UUID]]" so REST requests are aggregated together.

Another often-convenient one is simply s/\d+/[[DIGIT]]/g so that any numbers in the URLs are collapsed to a single token.

In addition, double-bracketed slugs could be converted to spans in the HTML output with CSS for a pretty text "slug." So a UUID replaced with [[UUID]] would stand out in the HTML output as not being part of the original URL.

mnot commented 12 years ago

What do you think about using something like the canonicalisation format in https://github.com/mnot/squid-director ?

plambert commented 12 years ago

That'd work, but would need a way to specify regexp substitutions with backreferences for query parameter values (and keys), whatever ;.* parameters are called, the URL path, et al.

Also, supporting lowercasing of backreferences would be important. It should be easy to rewrite http://FoO.bAr:80/BaZ/qUx.HTML to http://foo.bar/baz/qux.html for the purpose of aggregation.

And that'd probably also be useful in squid_director.

Hmm.

What about a separate URL canonicalization library/tool that both can share?

mnot commented 12 years ago

Well, there's many levels to URL canonicalisation. E.g.,

The first two are pretty easy to do; the last really needs to be hinted by the site, like in the map that director uses.

IIRC director already does many generic and scheme-specific canonicalisations; think we could do that here too easily.