mathesar-foundation / mathesar

Web application providing an intuitive user experience to databases.
https://mathesar.org/
GNU General Public License v3.0
2.28k stars 316 forks source link

Quoting refactor #3633

Closed seancolsen closed 6 days ago

seancolsen commented 2 weeks ago

As discussed during today's meeting (starting at 4:13).

From our hack MD meeting agenda:

I want to revisit our discussion around quoting SQL values. Here's what I'd like to propose:

  • In the msar schema, names in function arguments should be UNQUOTED. (This should already be satisfied.)
  • And in the msar schema, names in function return values should be also UNQUOTED. (This is not currently satisfied, but would only require small changes to satisfy.)

Adopting the second standard above would allow us to proceed incrementally with our quoting refactor. And we could simplify some code as we go. This approach is in line with the way I've been structuring my PRs thus far. Basically we could consider the __msar namespace deprecated and we could seek to rewrite functions into the msar namespace as we go, fully adopting the escape-on-output pattern within the msar schema.

Previously we agreed that we didn't want to mix patterns. Indeed I was vocal about this. But my opinion here has changed after some time working with the code. The realization I've come to is that we already have multiple patterns in some places. And I've noticed bugs due to it. My proposal seeks to slightly move (and clarify) the boundaries around where each pattern is appropriate. Names in arguments and return values would be exclusively quoted within __msar functions. And names in arguments and return values would be exclusively unquoted within the msar schema. Then, the more we wean ourselves off the __msar schema, the more surmountable the quoting refactor would eventually be.

To bring our SQL into conformance with the second standard above, I would:

  • Fix the following functions which currently return quoted values.

    • msar.get_column_name
    • msar.get_column_names
    • msar.get_constraint_name
    • msar.get_relation_name_or_null

    For each such function we could either: move it into the __msar schema; or we could remove the quoting on the return value (refactoring call sites as needed). We could decide between these two strategies on a per-function basis depending on which strategy appears easiest.

  • Move the following functions into the __msar schema.

    • msar.get_duplicate_col_defs
    • msar.process_col_def_jsonb
    • msar.process_con_def_jsonb

    This is because they return __msar.col_def and __msar.con_def types which contain quoted values. These functions are not called from the service layer, so this move seems fine to me. Conveniently, those custom types are already in the __msar namespace, as I would expect since they contain quoted values.

  • Rename msar.build_unique_column_name_unquoted to msar.build_unique_column_name

    This is because the fact that the column name is unquoted should be evident from the fact that it's in the msar schema.

    It's interesting to note here that this function calls msar.get_fresh_copy_name, and it's also is called by msar.build_unique_fkey_column_name. Both of those functions already return unquoted names (without making the lack of quoting clear in the function name or documentation). This is great example of the waters being even muddier than I first realized when I raised this topic. Multiple patterns are already here! I'm out to separate the wheat from the chaff.

  • Rename msar.get_fully_qualified_object_name to msar.build_qualified_name_sql

    This makes it clearer that the return value is an SQL fragment (quoted) and not an (unquoted) name.

The above changes are easy enough that I'd like to do them now, in a dedicated PR.

I followed my plan above except for msar.get_fully_qualified_object_name. I ended up moving that one into __msar because, well, I just don't like it. Grumble grumble. I don't think we actually need to be passing around fully qualified table names as SQL. So I'd like us to refactor that out when we refactor __msar out. We could debate this later. It's a small point. That function doesn't get called from very many places. But I don't like the way it encourages us to set up other functions that accept its output as input. I think we can reverse the order of composition in some cases and achieve a much cleaner result. But all of that is for later.

I also added some developer-facing documentation to the SQL directory which documents the new standard as well as some other things too.

Checklist

Developer Certificate of Origin

Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```
seancolsen commented 1 week ago

Ok I also added bf66a48c9a472b5381a5f42e264aae96dcbdcec8 with one more change. I found myself actually wanting a function msar.get_relation_name which returns the unquoted table name (without the schema name). So I changed the existing function name to be clear that it returns a qualified name.