sripathikrishnan / jinjasql

Template Language for SQL with Automatic Bind Parameter Extraction
MIT License
815 stars 89 forks source link

New feature: escape table and field names #32

Open tomasfarias opened 4 years ago

tomasfarias commented 4 years ago

Currently, table names and field names can be used with the sqlsafe filter. Would it be possible to introduce a new filter to handle escaping for table names and field names?

The motivation for this comes from a current bulk load job I'm working on which consists on a bunch of queries that all look the same except for table and schema that are fed from a configuration file. I'm currently using psycopg2.sql to handle escaping, but I'd prefer to remove the SQL code from the application. This would allow me to have access to Jinja features like template inheritance or tests to extend the bulk load job capabilities.

psycopg2.sql ends up using PQescapeIdentifier from libpq to handle escaping, but we can do what the folks at diesel did and implement it ourselves (forgetting about encodings for the example).

Adding something like this in jinjasql/jinjasql/core.py:

def identifier(*values: str) -> str:
    return Markup('.'.join(f'"{escape_quotes(value)}"' for value in values))

def escape_quotes(s: str) -> str:
    return s.replace('"', '""')

...

class JinjaSql(object):
        def _prepare_environment(self):
            self.env.filters["identifier"] = identifier

Would allow for templates like:

SELECT *
FROM {{ table | identifier }}

To be rendered as:

SELECT *
FROM "my_schema"."my_table"

Assuming a call to prepare_query like: j.prepare_query(template, {'table': ('my_schema', 'my_table')})

I understand this feature would be dependent on database system: this particular example would work with PostgreSQL but not with MySQL since double quotes for system identifiers are not a thing there. So the filter would need to be decorated with contextfilter so that a db_engine parameter can be read from it and new environments can be added over time; really bad example:

@contextfilter
def identifier(context, *values: str) -> str: 
    if context.eval_ctx.db_engine == 'postgres':
        return Markup('.'.join(f'"{escape_quotes(value)}"' for value in values))
    elif context.eval_ctx.db_engine == 'mysql':
        return do_mysql_escape(values)
    else:
        raise ValueError(f'{db_engine} not supported')

With the new attribute added to the environment:

class SqlExtension(Extension):
    def__init__(self, environment):
        super().__init__(environment)
        environment.extend(
            db_engine='postgres',
        )

edit: Forgot to add I'd be more than willing to implement something like this myself if it's deemed to be a good feature.

sripathikrishnan commented 4 years ago

I like the idea of the | identifier filter, this is a good feature to have.

I think the db_engine is also a good parameter. We could use db_engine to also choose the right param_style, and also use it to enable / disable certain database specific features. For example, postgres allows you to bind a list / dictionary, but other databases do not allow it.

So yes, I think this could be a good feature. I would be happy to take in a pull request!

tomasfarias commented 4 years ago

Thanks for the input! I'll try to get a PR going for the weekend.

sureshdsk commented 3 years ago

@sripathikrishnan @tomasfarias would love use this feature. will this be merged anytime soon?

tomasfarias commented 3 years ago

PR has been awaiting review for a few months now. I have time to work on this and I am most willing to make changes to my branch if needed.

I'm revisiting this project after some time and can already think of a few potential changes to make maintenance and development easier, automatic code formatting via black being one of them. Unfortunately, it seems like it has become inactive somewhat: last release was in May 2020.

ccolgrove commented 3 years ago

I'd also make use of this feature, and happy to help with anything that would help get it merged.

bfmcneill commented 3 years ago

Can we template [schema].[table_name] with this library? I tried it with the current version and got a slightly unexpected result

SELECT * 
    FROM %s.%s 
    ORDER BY %s 
    OFFSET %s ROWS 
    FETCH NEXT %s ROWS ONLY
sripathikrishnan commented 2 years ago

Looking at this issue with a fresh pair of eyes, some additional thoughts.

  1. I would like to keep JinjaSQL agnostic of the underlying database engine. Making it aware of database engine makes maintenance more complicated.
  2. It seems most databases use double quotes to quote identifiers
  3. MySQL by default uses back-ticks to quote, but you can set a mode ANSI_QUOTES, and then MySQL can also use double quotes. Since the same database can use different quote characters, this further strengthens my argument that we should not do an if condition on the db_engine.

Based on this, my solution would be:

  1. Add a constructor parameter identifier_quote_character, that defaults to double quotes
  2. Add the identifier filter as discussed above, but quote on the basis of identifier_quote_character

I will try and get this out in the next few weeks. Thanks for all the help!