simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.43k stars 676 forks source link

derive_named_parameters() method that works with latest SQLite #2354

Closed simonw closed 3 months ago

simonw commented 3 months ago

Related:

simonw commented 3 months ago

Code to fix, because the opcode trick no longer works: https://github.com/simonw/datasette/blob/7437d40e5dd4d614bb769e16c0c1b96c6c19647f/datasette/utils/__init__.py#L1137-L1150

simonw commented 3 months ago

Got ChatGPT Code Interpreter to have a go at this for me: https://chatgpt.com/share/f2ce4904-184c-4825-847d-30467c3a8236 - it came up with a pattern that first strips all comments, single-quoted and double-quoted strings and then extracts parameters from what's left.

simonw commented 3 months ago

This new version of the function passes all of the existing tests:

@documented
async def derive_named_parameters(db: "Database", sql: str) -> List[str]:
    """
    Given a SQL statement, return a list of named parameters that are used in the statement

    e.g. for ``select * from foo where id=:id`` this would return ``["id"]``
    """
    # Remove single-line comments
    sql = re.sub(r"--.*", "", sql)
    # Remove multi-line comments
    sql = re.sub(r"/\*.*?\*/", "", sql, flags=re.DOTALL)
    # Remove single-quoted strings
    sql = re.sub(r"'(?:''|[^'])*'", "", sql)
    # Remove double-quoted strings
    sql = re.sub(r'"(?:\"\"|[^"])*"', "", sql)
    # Extract parameters from what is left
    return re.findall(r":(\w+)", sql)

But... it doesn't need to take the db argument any more and it doesn't need to be async - but it is a documented function (glad we are not at the 1.0 final release).

I'm going to add a new, non-async function and switch to that, but I'll leave an async undocumented version in there so plugins that use it don't break.

simonw commented 3 months ago

New implementation: https://github.com/simonw/datasette/blob/d118d5c5bbb675f35baa5c376ee297b510dccfc0/datasette/utils/__init__.py#L1134-L1165

simonw commented 3 months ago

I don't think those comments are needed with the clear names for the _re compiled regexes.