vimeo / graphite-influxdb

An influxdb backend for Graphite-web and graphite-api
Apache License 2.0
198 stars 39 forks source link

get_leaves() broken #42

Closed Dieterbe closed 9 years ago

Dieterbe commented 9 years ago

since pr #39 get_leaves() now returns each leaf multiple times, once for each defined step rule. this function was modified because it was supposedly faster. i've written a test script and could not verify the claims. we benchmark a few different approaches here. in fact, results were too noisy to draw a clear result. one time one approach is faster, then it's the other. probably the manner of measuring is also no good. it's also independent of how many series matched. suggestions on how to improve the script are welcome (perhaps with real series and a real query) but i deliberately used a simple query as to minimize the effect of the regex matching.

below a few test runs, first with explicit printing of results of each method, later i just print the number of results. and finally the script.

~ ❯❯❯ ./python-test.py                                                                                                                                                      ⏎
bench of original in 0.33136s
[['foklnbyiba49tg4yps7nl39li1xa6n6964fg2n1ljigbo', 60],
 ['fod2dptwjl65utbaz5y9hhla5gef5t32rspl1lxl5yis6', 60],
 ['fo3kv8izt2barx37drhilyq79ug8v2zu8z2m822sxjusx', 60]]
--------------------
--------------------
bench of pr39 in 0.32841s
[('foklnbyiba49tg4yps7nl39li1xa6n6964fg2n1ljigbo', 60),
 ('foklnbyiba49tg4yps7nl39li1xa6n6964fg2n1ljigbo', 60),
 ('fod2dptwjl65utbaz5y9hhla5gef5t32rspl1lxl5yis6', 60),
 ('fod2dptwjl65utbaz5y9hhla5gef5t32rspl1lxl5yis6', 60),
 ('fo3kv8izt2barx37drhilyq79ug8v2zu8z2m822sxjusx', 60),
 ('fo3kv8izt2barx37drhilyq79ug8v2zu8z2m822sxjusx', 60)]
--------------------
--------------------
bench of mixedFn in 0.33337s
[('foklnbyiba49tg4yps7nl39li1xa6n6964fg2n1ljigbo', 60),
 ('fod2dptwjl65utbaz5y9hhla5gef5t32rspl1lxl5yis6', 60),
 ('fo3kv8izt2barx37drhilyq79ug8v2zu8z2m822sxjusx', 60)]
--------------------
--------------------
bench of mixed in 0.32136s
[('foklnbyiba49tg4yps7nl39li1xa6n6964fg2n1ljigbo', 60),
 ('fod2dptwjl65utbaz5y9hhla5gef5t32rspl1lxl5yis6', 60),
 ('fo3kv8izt2barx37drhilyq79ug8v2zu8z2m822sxjusx', 60)]
--------------------
--------------------
~ ❯❯❯ ./python-test.py
bench of original in 0.31945s
[['fo0b0jm8xwpb7wte6axp8sctbasceuudfoulbmbcfba9x', 60]]
--------------------
--------------------
bench of pr39 in 0.31719s
[('fo0b0jm8xwpb7wte6axp8sctbasceuudfoulbmbcfba9x', 60),
 ('fo0b0jm8xwpb7wte6axp8sctbasceuudfoulbmbcfba9x', 60)]
--------------------
--------------------
bench of mixedFn in 0.32163s
[('fo0b0jm8xwpb7wte6axp8sctbasceuudfoulbmbcfba9x', 60)]
--------------------
--------------------
bench of mixed in 0.32973s
[('fo0b0jm8xwpb7wte6axp8sctbasceuudfoulbmbcfba9x', 60)]
--------------------
--------------------
~ ❯❯❯ ./python-test.py
bench of original in 0.32971s
num results: 88
--------------------
--------------------
bench of pr39 in 0.32140s
num results: 176
--------------------
--------------------
bench of mixedFn in 0.31840s
num results: 88
--------------------
--------------------
bench of mixed in 0.33463s
num results: 88
--------------------
--------------------
~ ❯❯❯ ./python-test.py
bench of original in 0.34168s
num results: 83
--------------------
--------------------
bench of pr39 in 0.33666s
num results: 166
--------------------
--------------------
bench of mixedFn in 0.33500s
num results: 83
--------------------
--------------------
bench of mixed in 0.33549s
num results: 83
--------------------
--------------------
~ ❯❯❯ ./python-test.py
bench of original in 0.34352s (results: 91)
bench of pr39 in 0.33575s (results: 182)
bench of mixedFn in 0.34085s (results: 91)
bench of mixed in 0.33506s (results: 91)
bench of original in 0.34312s (results: 91)
bench of pr39 in 0.34076s (results: 182)
bench of mixedFn in 0.33724s (results: 91)
bench of mixed in 0.33398s (results: 91)
bench of original in 0.34319s (results: 91)
bench of pr39 in 0.35466s (results: 182)
bench of mixedFn in 0.33678s (results: 91)
bench of mixed in 0.34007s (results: 91)
bench of original in 0.36299s (results: 91)
bench of pr39 in 0.34341s (results: 182)
bench of mixedFn in 0.33877s (results: 91)
bench of mixed in 0.33573s (results: 91)
bench of original in 0.34222s (results: 91)
bench of pr39 in 0.34859s (results: 182)
bench of mixedFn in 0.34439s (results: 91)
bench of mixed in 0.33657s (results: 91)
#!/usr/bin/env python2
import string
import random
import datetime
# from pprint import pprint
import re

def id_generator(size=45, chars=string.ascii_lowercase + string.digits):
    return ''.join(random.choice(chars) for x in range(size))

schema = [
    ['^high-res-metrics', 10],
    ['', 60],
]

series = [id_generator() for i in range(100000)]
regex = re.compile("^f.*ba")
schemas = [(re.compile(patt), step) for (patt, step) in schema]

def bench(fn, desc):
    now = datetime.datetime.now()
    res = fn()
    end = datetime.datetime.now()

    dt = end - now
    print "bench of %s in %s.%ss (results: %d)" % (desc, dt.seconds, dt.microseconds, len(res))

def orig():
    leaves = []
    for name in series:
        if regex.match(name) is not None:
            res = 60  # fallback default
            for (rule_patt, rule_res) in schemas:
                if rule_patt.match(name):
                    res = rule_res
                    break
            leaves.append([name, res])
    return leaves

def pr39():
    leaves = [(name, (res if pattern.match(name) else 60))
              for name in series
              if regex.match(name)
              for (pattern, res) in schemas
              ]
    return leaves

def mixedFn():
    names = [name for name in series if regex.match(name)]

    def setSchema(name):
        res = 60  # fallback default
        for (rule_patt, rule_res) in schemas:
            if rule_patt.match(name):
                res = rule_res
                break
        return (name, res)
    leaves = map(setSchema, names)
    return leaves

def mixed():
    leaves = [(name, next((res for (patt, res) in schemas if patt.match(name)), 60))
              for name in series if regex.match(name)
              ]
    return leaves

for i in range(5):
    bench(orig, "original")
    bench(pr39, "pr39")
    bench(mixedFn, "mixedFn")
    bench(mixed, "mixed")
Dieterbe commented 9 years ago

since there is no clear winner, and mixed() is the simplest, and correct, i'll just use that

Dieterbe commented 9 years ago

i notice we sometimes check if reg.match() and sometimes if reg.match() is not None but this also doesn't seem to make a difference