I went around in circles a bit with this PR, which is why it took a while. I'll describe my thought process chronologically in hopes that it might be useful as these same problems could potentially apply to other DB objects as well...
First I proceeded to remove the if_not_exists parameter from msar.create_schema because I didn't want the additional complexity in there. I wanted to default to not using IF NOT EXISTS for the sake of simplicity. This is in line with comments in this meeting (starting at 21:10) on which @mathemancer and @Anish9901 were supportive. I wanted to raise an exception if the supplied schema name already existed.
But then I noticed that our codebase actually made use of that if_not_exists parameter in a few places for internal schemas and creating ephemeral schemas on the fly for type inference. Ugh!
My initial instinct was to modify the IF NOT EXISTS code locations by resorting to calling raw SQL from within the service layer (instead of calling functions). I had a tricky time figuring out how to do that cleanly with SQLAlchemy. I imagine it's possible, but it didn't seem to fit our patterns, and I didn't want to fiddle with it too much.
After some hemming and hawing, I decided to retain support for if_not_exists, in msar.create_schema while also adding support for setting descriptions on schemas within the same function. This choice sent me down a rabbit hole, making me wish I had trusted my early intuition on if_not_exists more strongly.
The msar.create_schema function got more and more complex as I discovered more edge cases and inconsistencies. If it can return an existing schema, then we need to make sure that NULL description values don't overwrite existing description. Ok fine. Then for the front end's sake, we should actually be fetching and returning the description. Ok. But that means the return value gets much more complex. It should be an object instead of a simple oid. And if it's an object, it ought to match the structure of the objects returned from msar.get_schemas, meaning it should supply a table_count property too. Ugh. So I started modifying msar.get_schemas to accept a sch_oid filter parameter which would allow me to compose that function to reliably get the full schema details in a consistent manner. But then I just threw up my hands and said, "this is ridiculous!"
So I decided to split the function into two:
msar.create_schema_if_not_exists(sch_name text)
and
msar.create_schema(sch_name text, description text DEFAULT '')
This means you can use IF NOT EXISTSOR you can supply a description — but you can't do both at the same time. And at the API layer there's no way to use IF NOT EXISTS.
This approach has some limitations. But compared to cramming all that logic in one function, it's much simpler and less prone to weird edge cases and bugs.
Checklist
[x] My pull request has a descriptive title (not a vague title like Update index.md).
[x] My pull request targets the develop branch of the repository
[x] My code follows the established code style of the repository.
[x] I added tests for the changes I made (if applicable).
[x] I added or updated documentation (if applicable).
[x] I tried running the project locally and verified that there are no visible errors.
Developer Certificate of Origin
Developer Certificate of Origin
```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
Fixes #3619
Notes
I went around in circles a bit with this PR, which is why it took a while. I'll describe my thought process chronologically in hopes that it might be useful as these same problems could potentially apply to other DB objects as well...
First I proceeded to remove the
if_not_exists
parameter frommsar.create_schema
because I didn't want the additional complexity in there. I wanted to default to not usingIF NOT EXISTS
for the sake of simplicity. This is in line with comments in this meeting (starting at 21:10) on which @mathemancer and @Anish9901 were supportive. I wanted to raise an exception if the supplied schema name already existed.But then I noticed that our codebase actually made use of that
if_not_exists
parameter in a few places for internal schemas and creating ephemeral schemas on the fly for type inference. Ugh!My initial instinct was to modify the IF NOT EXISTS code locations by resorting to calling raw SQL from within the service layer (instead of calling functions). I had a tricky time figuring out how to do that cleanly with SQLAlchemy. I imagine it's possible, but it didn't seem to fit our patterns, and I didn't want to fiddle with it too much.
After some hemming and hawing, I decided to retain support for
if_not_exists
, inmsar.create_schema
while also adding support for setting descriptions on schemas within the same function. This choice sent me down a rabbit hole, making me wish I had trusted my early intuition onif_not_exists
more strongly.The
msar.create_schema
function got more and more complex as I discovered more edge cases and inconsistencies. If it can return an existing schema, then we need to make sure thatNULL
description values don't overwrite existing description. Ok fine. Then for the front end's sake, we should actually be fetching and returning the description. Ok. But that means the return value gets much more complex. It should be an object instead of a simple oid. And if it's an object, it ought to match the structure of the objects returned frommsar.get_schemas
, meaning it should supply atable_count
property too. Ugh. So I started modifyingmsar.get_schemas
to accept asch_oid
filter parameter which would allow me to compose that function to reliably get the full schema details in a consistent manner. But then I just threw up my hands and said, "this is ridiculous!"So I decided to split the function into two:
msar.create_schema_if_not_exists(sch_name text)
and
msar.create_schema(sch_name text, description text DEFAULT '')
This means you can use
IF NOT EXISTS
OR you can supply a description — but you can't do both at the same time. And at the API layer there's no way to useIF NOT EXISTS
.This approach has some limitations. But compared to cramming all that logic in one function, it's much simpler and less prone to weird edge cases and bugs.
Checklist
Update index.md
).develop
branch of the repositoryDeveloper Certificate of Origin
Developer Certificate of Origin
``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```