simonw / django-sql-dashboard

Django app for building dashboards using raw SQL queries
https://django-sql-dashboard.datasette.io/
Apache License 2.0
437 stars 37 forks source link

If a column is called "group" (or other reserved words) the example link to the table doesn't work #134

Closed simonw closed 3 years ago

simonw commented 3 years ago

Clicking this link:

SQL__select_count____from_availability_tag_____select_id__name__notes__disabled__previous_names__slug__group_from_availability_tag

Links to this SQL which throws an error:

select id, name, notes, disabled, previous_names, slug, group from availability_tag

Error:

syntax error at or near "group" LINE 1: ... id, name, notes, disabled, previous_names, slug, group from... ^

simonw commented 3 years ago

Fix is to wrap the column name in double quotes:

select id, name, notes, disabled, previous_names, slug, "group" from availability_tag

Need a list of reserved SQL words - or some other mechanism for detecting them, maybe psycopg2 has something?

simonw commented 3 years ago

https://www.psycopg.org/docs/extensions.html#psycopg2.extensions.quote_ident

psycopg2.extensions.``quote_ident(str, scope)

Return quoted identifier according to PostgreSQL quoting rules.

The scope must be a connection or a cursor, the underlying connection encoding is used for any necessary character conversion.

simonw commented 3 years ago

That quotes identifiers whether or not they need to be quoted, which I find a little ugly. It gave me this result:

select "id", "name", "notes", "disabled", "previous_names", "slug", "group" from availability_tag

I'm optimistic about this to help solve the problem:

select * from pg_get_keywords()
simonw commented 3 years ago

It looks like the full list of reserved words (that are invalid if used as column names without double quotes) can be found with:

select string_agg(word, ', ') from pg_get_keywords() where catcode = 'R'

It's:

all, analyse, analyze, and, any, array, as, asc, asymmetric, both, case, cast, check, collate, column, constraint, create, current_catalog, current_date, current_role, current_time, current_timestamp, current_user, default, deferrable, desc, distinct, do, else, end, except, false, fetch, for, foreign, from, grant, group, having, in, initially, intersect, into, lateral, leading, limit, localtime, localtimestamp, not, null, offset, on, only, or, order, placing, primary, references, returning, select, session_user, some, symmetric, table, then, to, trailing, true, union, unique, user, using, variadic, when, where, window, with

simonw commented 3 years ago

I could incorporate that into this SQL query in a clever way: https://github.com/simonw/django-sql-dashboard/blob/dd1bb18e45b40ce8f3d0553a72b7ec3cdc329e69/django_sql_dashboard/views.py#L141-L164

Or I could run it once at the start, pull the list of reserved words into Python and apply optional escaping in Python code.

simonw commented 3 years ago

Doing it with fancy SQL looks like it could work really well:

with visible_tables as ( 
  select table_name 
    from information_schema.tables 
    where table_schema = 'public' 
    order by table_name 
),
reserved_keywords as (
  select word
    from pg_get_keywords()
    where catcode = 'R'
  union select 'id' as word
)
select 
  information_schema.columns.table_name, 
  string_agg(column_name, ', ' order by ordinal_position) as columns,
  array_agg(case when column_name in (select word from reserved_keywords) then '"' || column_name || '"' else column_name end order by ordinal_position) as columns 
from 
  information_schema.columns 
join 
  visible_tables on 
  information_schema.columns.table_name = visible_tables.table_name 
where 
  information_schema.columns.table_schema = 'public' 
group by 
  information_schema.columns.table_name 
order by 
  information_schema.columns.table_name

(I added the union select 'id' as word bit just to ensure I had some columns that were treated as needing quoting)

Output:

SQL__with_visible_tables_as___select_table_name_from_information_schema_tables_where_table_schema____public__order_by_table_name____reserved_keywords_as___select_word_from_pg_get_keywords___where_catcode____R____select_information_schema_co
simonw commented 3 years ago

I also need to solve this for "sort by column" queries, see #57 - so I'm going to load the results of pg_get_keywords() once (and cache for the lifetime of the server) and implement this in Python instead.

simonw commented 3 years ago

Python 3.9 introduced a @functools.cache decorator - https://github.com/python/cpython/blob/3.9/Lib/functools.py#L650 - but under the hood it looks like this, and lru_cache has been in Python since 3.2:

def cache(user_function, /):
    'Simple lightweight unbounded cache.  Sometimes called "memoize".'
    return lru_cache(maxsize=None)(user_function)
simonw commented 3 years ago
In [3]: @functools.lru_cache(maxsize=None)
   ...: def cached():
   ...:     print("called")
   ...:     return 5
   ...: 

In [4]: 

In [4]: cached()
called
Out[4]: 5

In [5]: cached()
Out[5]: 5
simonw commented 3 years ago

That won't work because the function needs to be passed a connection. I'll roll my own dumb caching mechanism instead.

simonw commented 3 years ago
_reserved_words = None

def postgresql_reserved_words(connection):
    global _reserved_words
    if _reserved_words is None:
        with connection.cursor() as cursor:
            cursor.execute("select word from pg_get_keywords() where catcode = 'R'")
            _reserved_words = [row[0] for row in cursor.fetchall()]
    return _reserved_words