sripathikrishnan / jinjasql

Template Language for SQL with Automatic Bind Parameter Extraction
MIT License
815 stars 89 forks source link

Can I use junjasqsl for my (complicated) use case? #29

Closed ghuname closed 4 years ago

ghuname commented 4 years ago

Hi,

I am getting variables from outer world (not under my influence) in the following format:

json = {
  "scopedVars": {
    "VARTEXT": {
      "text": "sel_val",
      "value": "sel_val"
    },
    "VARTEXT_MV": {
      "text": "sel_val",
      "value": [
        "sel_val"
      ]
    },
    "VARTEXT_MV_ALL": {
      "text": "All",
      "value": [
        "sel_val1",
        "sel_val2",
        "sel_val3"
      ]
    },
    "VARINT": {
      "text": "48",
      "value": "48"
    },
    "VARINT_MV": {
      "text": "62 + 55",
      "value": [
        "62",
        "55"
      ]
    },
    "VARINT_MV_ALL": {
      "text": "62 + 55 + 41",
      "value": [
        "62",
        "55",
        "41"
      ]
    }
  }
}

In the json above I have variation of received variables: VARTEXT - single value string variable VARTEXT_MV - multi value string variable (in this case with a single value selected, list of values from which I can choose doesn't have an "All" option) VARTEXT_MV_ALL - multi value string variable with selected "All" option VARINT - single value integer variable VARINT_MV - multi value integer variable (several values) with no "All" option VARINT_MV_ALL - multi value integer variable with selected "All" option

In all cases I have a list from which I can select single value (than I have that value in "text" and "value" keys), multiple values (than in "text" key I have plus sign separated values and in "value" i have list of selected values) or multiple values with "All" option (that behaves as multiple values except when "All" is selected, in that case "text" is "All" and "value" is a list of all options).

As I don't have any clue about data types, I have to maintain a separate dictionary like this one:

dtypes = {
    'VARTEXT' : 'str',
    'VARINT' : 'int'
}

Now with information above I should create sql queries. I have to use single quote for all string variables.

For example:

  1. string variable - single value selected

select * from table where string_field = 'sel_val' (or string_variable in ('sel_val')

  1. string variable - multiple values selected (in this case single value selected from multiple values)

select * from table where string_field in ('sel_val')

  1. string variable - "All" selected

select * from table (if "All" is selected I don't need the condition at all)

  1. integer variable - single value selected

select * from table where integer_field = 48 (or integer_variable in (48), no single quotes here)

  1. integer variable - multiple values selected

select * from table where integer_field in (64,55)

  1. integer_variable - "All" selected

select * from table (no condition)

I thought, maybe to put single value in the list as well so all "value" keys will contain a list. Than I could use these lists always in sql in() function (and never use =). I have to decide whether I need to single quote values (string) or not (integer), and finally, if "All" option is selected to drop the condition.

Can I use junjasql for such use case? Is it feasible, or I should use something else?

sripathikrishnan commented 4 years ago

Yes, you can use JinjaSQL for this use case.

  1. In JinjaSQL, you do not need to quote. Depending on the type (string v/s int) - it will automatically bind the parameter so you do not have to.
  2. For list, you can always check the data type and enclose it in an if condition

Something like this:

SELECT * FROM table
WHERE 1 = 1
{% if scopedVars.VARTEXT.value %} AND string_field = {{ scopedVars.VARTEXT.value }}  {% endif %}
{% if scopedVars.VARTEXT_MV.value %} AND string_field in {{ scopedVars.VARTEXT_MV.value | inclause }}  {% endif %}
--- add additional where clauses

Note:

  1. The 1 = 1 in the query is to ensure a valid query if all other where clauses are absent
  2. The | inclause will generate as many placeholders as the length of the list

This way, the query will be generated dynamically, and JinjaSQL will track of what parameters to bind to the query.

ghuname commented 4 years ago

@sripathikrishnan Thanks for an answer. How about if database field is of type array (https://clickhouse.tech/docs/en/sql-reference/data-types/array/), and I would like to check if field_array contains any of the values from the "value" array?

If I have a select:

select * from table where 1 = 1 and hasAny(array_field, {{ scopedVars.VARTEXT_MV.value | inclause }})

will it be expanded to:

select * from table where 1 = 1 and hasAny(array_field, ['sel_val']) # (square brackets are must, all values in the array should be single quoted)

As "value" array can contain strings and integers (floats), can jinjasql handle this as well?

Best regards.

ghuname commented 4 years ago

I tried the following:

params = {
  ...:     'VARTEXT_MV': ['val1', 'val2']
  ...: }
  ...: 
sql_template = """
  ...:         select *
  ...:         from table
  ...:         where 1 = 1
  ...:         {% if VARTEXT_MV %} AND cmts in {{ VARTEXT_MV | inclause }}  {% endif %}
  ...:     """

from jinjasql import JinjaSql
j = JinjaSql()
query, bind_params = j.prepare_query(sql_template, params)
print(query)
        select *
        from table
        where 1 = 1
         AND cmts in (%s,%s)  

print(bind_params)
['val1', 'val2']
print(query % bind_params)
Traceback (most recent call last):
  File "C:\Users\user\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-5d08a35374be>", line 1, in <module>
    print(query % bind_params)
ghuname commented 4 years ago

As I can see this works:

print(query % tuple(bind_params)

        select *
        from table
        where 1 = 1
         AND cmts in (val1,val2)  

But no single quotes. Can I get single quoted var1 and var2?

sripathikrishnan commented 4 years ago

@ghuname - JinjaSQL is doing the right thing. You are NOT supposed to use query % (bind_params) - that is string substitution and will lead to SQL Injection. Instead, you should send the query JinjaSQL to your database along with the bind params. If you use that approach, you do not have to quote your strings.

ghuname commented 4 years ago

Well, I am working with Clickhouse (https://clickhouse.tech/docs/en/) database. As I am using clichouse_driver (https://github.com/mymarilyn/clickhouse-driver) for accessing the database, I need final string substitution.

Furthermore, if you want to log query, you have to substitute all variables. Do you have a suggestion to me for my use case?

sripathikrishnan commented 4 years ago

JinjaSQL cannot generate queries with substituted parameters, mainly because binding parameters is important to prevent sql injection.

That said, Clickhouse does support parameterized queries, and so JinjaSQL can solve your usecase. See https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#selecting-data. Per clickhouse documentation, the specific param_style is pyformat.

Based on the document you shared, you would have to write your code as follows:

j = JinjaSql(param_style='pyformat')
...

Then, the prepare_query method will return bind_params as a dictionary object instead of a list. You can then pass the dictionary directly to clickhouse's client.execute method as the second parameter.

For your second use case of printing the query for debugging - at the moment, you will have to print the query along with the bind parameters separately and use it for debugging.

ghuname commented 4 years ago

Hi @sripathikrishnan, I managed to get working solution by using Python DB API 2.0 (https://clickhouse-driver.readthedocs.io/en/latest/features.html#python-db-api-2-0, which I haven't used before), and as you said it works.

It is really sad that it is not possible to print query with fulfilled parameters, because it is very useful for debugging. I believe I am not the only one that needs this functionality.

I am thinking to create a function that will (single) quote all string parameters (str, or List[str], as well as handle all other types of parameters). This function will return parameters that are prepared for print(query % tuple(bind_params)) in proper format.

Anyway, thank you very much for all answers you have provided.