sripathikrishnan / jinjasql

Template Language for SQL with Automatic Bind Parameter Extraction
MIT License
807 stars 88 forks source link

Use sys.maxsize as upper bound for randrange #11

Closed ksindi closed 6 years ago

ksindi commented 6 years ago

It's possible to never exit the while loop if there are more than 1000 items that need to be bound. This PR mitigates it by using sys.maxsize, which is Python 2 and 3 compatible.

There is no real performance penalty by increasing the upper bound of random.randrange:

%timeit random.randrange(1, 1000)
# 773 ns ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit random.randrange(1, sys.maxsize)
# 822 ns ± 3.58 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Reproducing error:

import string
import random

def generate_sql(items=None):
    sql_template = """
    SELECT
      col
    FROM
      table
    WHERE
      1
      {% if items %}
      AND table.col IN {{ items | inclause }}
      {% endif %}
    """

    j = JinjaSql()
    data = dict(items=items)
    sql, bind_params = j.prepare_query(sql_template, data)
    return sql, bind_params

def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))

# works
generate_sql(items=[id_generator() for _ in range(1000)])
# infinite loop!
generate_sql(items=[id_generator() for _ in range(1001)])
ksindi commented 6 years ago

cc @sripathikrishnan

benrudolph commented 6 years ago

ran into this too: https://github.com/hashedin/jinjasql/pull/17. though not sure we why wouldn't use hash, are we worried about performance?

ksindi commented 6 years ago

@benrudolph hashing is better. closing my PR in favor of yours.